Re: [issue43976] Allow Python distributors to add custom site install schemes

2021-05-04 Thread M.-A. Lemburg
On 04.05.2021 22:07, Steve Dower wrote:
> 
> Perhaps what I'm suggesting here is that I don't see any reason for "sudo pip 
> install ..." into a distro-installed Python to ever need to work, and would 
> be quite happy for it to just fail miserably every time (which is already the 
> case for the Windows Store distro of Python).

The "pip install" into a root environment approach is the standard way
to setup Docker (and similar) containers, so I think trying to break
this on purpose will not do Python a good service.

The pip warning about this kind of setup which apparently got added
in one of the more recent versions of pip already is causing a lot
of unnecessary noise when building containers and doesn't make Python
look good in that environment.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue43510] PEP 597: Implemente encoding="locale" option and EncodingWarning

2021-03-31 Thread M.-A. Lemburg
On 31.03.2021 11:30, STINNER Victor wrote:
> 
> To me, it sounds really weird to accept an encoding when a file is opened in 
> binary mode. open(filename, "rb", encoding="locale") looks like a bug.

Same here.

If encoding is used as an argument and then not used, this is a bug,
not a feature :-)

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread M.-A. Lemburg
On 19.03.2021 14:57, Inada Naoki wrote:
> 
> Background: PEP 597 adds new `encoding="locale"`option to open() and 
> TextIOWrapper(). It is same to `encoding=None` for now, but it means using 
> "locale encoding" explicitly.
> 
> But this is wrong in UTF-8 mode.

Please address UTF-8 mode explicitly in open() or elsewhere. The locale
module is about the state of the lib C, not what Python enforces via
options in its own I/O layers.

As mentioned, both should ideally be synchronized, though, so
UTF-8 mode in Python should trigger setting a UTF-8 encoding
via setlocale().

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread M.-A. Lemburg
On 19.03.2021 14:47, STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
>> - If you add "current", people will rightly ask: then what do all the
>> other APIs in the locale module return ? Of course, they all return
>> the current state of settings :-) So this is unnecessary as well.
> 
> The problem is that there are two different "locale encodings", what I call:
> 
> * "current locale encoding": nl_langinfo(CODESET) in short
> * "Python locale encoding": "UTF-8" in some cases, nl_langinfo(CODESET) 
> otherwise

The UTF-8 mode is a Python invention. It doesn't have anything to
do with the lib C locale functions, which this module addresses and
interfaces to.

Please don't mix the two.

In fact, in order to avoid issues, Python should probably set the locale
encoding to UTF-8 as well, when run in UTF-8 mode. It's dangerous to
have Python and the lib C use different assumptions about the encoding,
esp. in embedded applications.

> It is unfortunate that the Python UTF-8 Mode which "ignores the locale" 
> changes the behavior of the locale module, of the 
> locale.getpreferredencoding() function. But the ship has sailed.
> 
> People are used to look into the "locale" module to get the "locale" 
> encoding. So I prefer to put  the function to get the "Python locale 
> encoding" in the locale module.
> 
> I propose to add "current" in the name since this encoding is not the one you 
> are looking for usually.
> 
> An alternative is to have a single function with an optional parameter. 
> Example:
> 
> * get_locale_encoding() or get_locale_encoding(True) returns the locale 
> encoding
> * get_locale_encoding(False) returns the current locale encoding

-1, both on the names and the idea to again add parameters which change
their meaning. We should have one function per meaning and really
only need the interface getencoding(), since the UTF-8 mode
doesn't fit into the locale module scope.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread M.-A. Lemburg
On 19.03.2021 12:35, Eryk Sun wrote:
> 
> Eryk Sun  added the comment:
> 
>> Read the ANSI code page on Windows,
> 
> I don't see why the Windows implementation is inconsistent with POSIX here. 
> If it were changed to be consistent, the default encoding at startup would 
> remain the same, since setlocale(LC_CTYPE, "") uses the process code page 
> from GetACP().

I'm not sure I understand what you're saying (but then, I have little
experience with locales on Windows).

My assumption is that nl_langinfo(CODESET) does not work on Windows
or gives wrong results. Is that incorrect ?

If it does work, getencoding() could just be a shim over
nl_langinfo(CODESET) on all platforms.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread M.-A. Lemburg
On 19.03.2021 12:26, STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
> Recently, I spent some days to document properly encodings used by Python.

Thanks for documenting this.

I would prefer to leave the locale module to really just an interface
to the lib C locale logic and not add encoding details which are
specific to Python's view on I/O (sys or io) or the file system (os).

Hopefully, in a few years, we can get rid of all this and standardize
on UTF-8 everywhere.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread M.-A. Lemburg
On 19.03.2021 12:05, STINNER Victor wrote:
> I'm not sure what to do with locale.getdefaultlocale(). Should we deprecate 
> it? I never used this function. How is it used? For which purpose?
>
> I undertand that in 2000, locale.getdefaultlocale() was interesting to avoid 
> calling setlocale(LC_CTYPE, ""). But Python 3 calls setlocale(LC_CTYPE, "") 
> by default at startup since the early versions, and it's now called on all 
> platforms since Python 3.8. Moreover, its internal database seems to be 
> outdated and is painful to maintain (especially if we consider all platforms 
> supported by Python, not only Linux, there are many issues on macOS).

Yes, deprecate it as well. If Python calls setlocale() per default now,
it has served its purpose.

The alias database is needed by the normalization engine. We may be
able to drop the encoding part, but this would have to be checked.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread M.-A. Lemburg
On 19.03.2021 11:36, STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
>> locale.getencoding()
>>
>> which interfaces to nl_langinfo(CODESET) or the Windows code
>> page and does not try to do any magic, ie. does *not* call
>> setlocale(). It needs to return what the lib C currently
>> knows and uses as encoding.
> 
> This is locale.get_current_locale_encoding(). I would like to put "current" 
> in the name, because there is a lot of confusion between 
> get_current_locale_encoding() encoding and locale.getpreferredencoding(False) 
> encoding. In locale.getpreferredencoding(False), Python ignores the locale in 
> some cases which is counter intuitive.

These attempts have resulted much of the confusion around the locale
module. It's better not to create more of it.

- "locale" in the name is unnecessary, since this is the locale module.

- If you add "current", people will rightly ask: then what do all the
other APIs in the locale module return ? Of course, they all return
the current state of settings :-) So this is unnecessary as well.

locale.getencoding() works in the same way as locale.getlocale().
It interfaces to the lib C and returns the current encoding setting
as known by the lib C. It's just a more intuitive name than
locale.nl_langinfo(CODESET) and works on Windows as well.

And, again, locale.getpreferredencoding() should be deprecated.
The API has been misused in too many ways and is completely broken
by now. It was a good idea at the time, when Martin added it,
even though I never liked the name.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread M.-A. Lemburg
On 19.03.2021 10:17, STINNER Victor wrote:
> 
> New submission from STINNER Victor :
> 
> I propose to add two new functions:
> 
> * locale.get_locale_encoding(): it's exactly the same than 
> locale.getpreferredencoding(False).
> 
> * locale.get_current_locale_encoding(): always get the current locale 
> encoding. Read the ANSI code page on Windows, or nl_langinfo(CODESET) on 
> other platforms. Ignore the UTF-8 Mode. Don't always return "UTF-8" on macOS, 
> Android, VxWorks.

I'm not sure whether this would improve the situation much.

The problem is that the locale module is meant to expose the lib C
locale settings, but many of the recent additions actually do something
completely different: they look into the process and user environment
and try to determine external settings, which are not reflected in
the lib C locale settings.

I had added locale.getdefaultlocale() to give applications a chance
to determine the locale setting defined by the process environment
*without* calling setlocale(LC_ALL, '') and causing problems
in other threads. I used the X11 database for locale encodings,
which was the closest you could get to in terms of a standard for
encodings at the time (around 2000).

Part of the return value is the encoding, which would be set.

Martin later added locale.getpreferredencoding(), which tries to
determine the encoding in a different way new way, based on
nl_langset(CODEINFO). As you mentioned, this intention was broken
on several platforms by forcing UTF-8 as output. And in many cases,
the API had to call setlocale() as well, causing the thread problems.

However, the problem with nl_langset(CODEINFO) is the same as
with setlocale(): it returns the current state of the lib C
settings, which may well point to the 'C' locale. Not the ones
the user has configured in the OS environment. So while you get
an encoding defined by lib C for the current locale settings
(without guessing it as with locale.getdefaultlocale()), you
still don't get what the user really wants to use.

Unfortunately, lib C does not provide a way to query the locale
database without changing the locale settings at the same time.
This is the main issue we're facing.

Now, the correct way in all this would be to just call
setlocale(LC_ALL, '') at the start of the application and
not try to apply all the magic to get around this. But this
has to be done by the application and not Python (which may
well be embedded into some other application).

I'd suggest to add a single new API:

locale.getencoding()

which interfaces to nl_langinfo(CODESET) or the Windows code
page and does not try to do any magic, ie. does *not* call
setlocale(). It needs to return what the lib C currently
knows and uses as encoding.

locale.getpreferredencoding() should then be deprecated.

It does not make sense to pretend to query information which is
not really directly available from the lib C locale system.

And the documentation should point out that applications should
call setlocale(LC_ALL, '') when they start up, if they want to
get the lib C locale, and thus Python locale module, setup to
work based on what the user really wants -- instead of just
guessing at this.

PS: The locale module normally does not use underscores in
function names, so it's not a good idea to add more.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue43115] locale.getlocale fails if locale is set

2021-02-18 Thread M.-A. Lemburg
On 17.02.2021 15:02, Anders Munch wrote:
>> BTW: What is wxWidgets doing with the returned values ?
> 
> wxWidgets doesn't call getlocale, it's a C++ library (wrapped by wxPython) 
> that uses C setlocale.
> 
> What does use getlocale is time.strptime and datetime.datetime.strptime, so 
> when getlocale fails, strptime fails.

Would they work with getlocale() returning None for the encoding ?

>> We could enhance this to return None for the encoding instead
>> of raising an exception, but would this really help ?
> 
> Very much so.
> 
> Frankly, I don't get the impression that the current locale preferred 
> encoding is used for *anything*.  Other than possibly having a role in 
> implementing getpreferredencoding.

The logic for getdefaultencoding() predates getpreferredencoding().
I had added it because calling setlocale() just to figure out the default
encoding and then resetting it to the original setting is dangerous
in a multi-threaded application such as a web server -- setlocale()
changes the locale for the entire process.

>> Alternatively, we could add "en_DE" to the alias table and set
>> a default encoding to use. 
> 
> Where would you get a complete list of all the new aliases that would need be 
> to be added?

We could start with the list of supported locales in Windows:

https://docs.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160

Since everything is moving toward UTF-8 as standard encoding,
I'd suggest use alias the plain locale codes to .UTF-8.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue43115] locale.getlocale fails if locale is set

2021-02-17 Thread M.-A. Lemburg
On 17.02.2021 10:55, Anders Munch wrote:
 import locale
 locale.setlocale(locale.LC_ALL, 'en_DE')
> 'en_DE'
 locale.getlocale()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "C:\flonidan\env\Python38-64\lib\locale.py", line 591, in getlocale
> return _parse_localename(localename)
>   File "C:\flonidan\env\Python38-64\lib\locale.py", line 499, in 
> _parse_localename
> raise ValueError('unknown locale: %s' % localename)
> ValueError: unknown locale: en_DE

The locale module does not know this encoding, so cannot
guess the encoding. Since getlocale() returns the language
code and encoding, this fails.

If you add the encoding, you should be fine:

>>> locale.setlocale(locale.LC_ALL, 'en_DE.UTF-8')
'en_DE.UTF-8'
>>> locale.getlocale()
('en_DE', 'UTF-8')

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue42967] Web cache poisoning - `;` as a query args separator

2021-01-20 Thread M.-A. Lemburg
On 20.01.2021 12:07, STINNER Victor wrote:
> Maybe we should even go further in Python 3.10 and only split at "&" by 
> default, but let the caller to opt-in for ";" separator as well.

+1.

Personally, I've never seen URLs encoded with ";" as query parameter
separator in practice on the server side.

The use of ";" was recommended in the HTML4 spec, but only in an
implementation side note:

https://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2

and not in the main reference:

https://www.w3.org/TR/1999/REC-html401-19991224/interact/forms.html#h-17.13.4.1

Browsers are also pretty relaxed about seeing non-escaped ampersands in
link URLs and do the right thing, so the suggested work-around for
avoiding escaping is not really needed.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue28468] Add platform.freedesktop_os_release()

2020-11-25 Thread M.-A. Lemburg
On 25.11.2020 11:13, STINNER Victor wrote:
> Platform was always a thin wrapper to OS functions. For example, there is no 
> unified API to retrieve OS name and version on Windows, macOS or Linux. You 
> need to pick the proper function. For me, freedesktop_os_release() just 
> follows this trend.

Not really. We have functions per OS, but not functions which only work
on a subset of distros of an OS.

The patch also has other issues:

A text file parse could be a private function in the module,
but it doesn't fit the platform module API spirit.

platform module APIs should return meaningful information and
provide defaults where these cannot be determined. Accordingly,
an API would have to return a tuple (distname, version, id), just
like linux_distribution() did.

Regardless, I don't see the point of opening up this can of
worms again. We settled on moving Linux distribution version detection
out of the stdlib and that was a good decision.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Nov 25 2020)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue41842] Add codecs.unregister() to unregister a codec search function

2020-09-23 Thread M.-A. Lemburg
Just found an internal API which already takes care of
unregistering a search function: _PyCodec_Forget().

All that needs to be done is to expose this as codecs.unregister()
and add the clearing of the lookup cache.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Sep 23 2020)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue41842] Add codecs.unregister() to unregister a codec search function

2020-09-23 Thread M.-A. Lemburg
On 23.09.2020 14:56, STINNER Victor wrote:
> Marc-Andre Lemburg explained:
> 
> "There is no API to unregister a codec search function, since deregistration
> would break the codec cache used by the registry to speedup codec
> lookup."
> 
> One simple solution would be to clear the cache 
> (PyInterpreterState.codec_search_cache) when codecs.unregister() removes a 
> search function. I expect that calling unregister() is an uncommon operation, 
> so the performance is not a blocker issue.

+1

BTW: While you're at it, having a way to access the search function
list from Python would be nice as well, since this would then open
up the possibility to reorder search functions.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Sep 23 2020)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue32429] Outdated Modules/Setup warning is invisible

2017-12-27 Thread M.-A. Lemburg
On 27.12.2017 00:24, Antoine Pitrou wrote:
> 
> Antoine Pitrou  added the comment:
> 
>> +1 - do you have any thoughts on that?
> 
> I think the current scheme may have been useful at a time where DVCS didn't 
> exist.  You would maintain an unversioned copy of Modules/Setup.dist in your 
> work-tree.  Nowadays you can fork a github repo and maintain your own branch 
> with changes to a tracked file.  I don't think Modules/Setup deserves special 
> treatment compared to, say, setup.py or Makefile.pre.in.

The file is mostly meant for people using tar balls rather than
checkouts to give them an easy way back to default settings
after making changes to the Modules/Setup file.

The same could be had by having Makefile.pre.in generate Setup.dist
from Setup while booting into build mode, avoiding the need to
sometimes create Modules/Setup manually.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue32110] Make codecs.StreamReader.read() more compatible with read() of other files

2017-11-22 Thread M.-A. Lemburg
On 22.11.2017 08:40, Serhiy Storchaka wrote:
> Usually the read() method of a file-like object takes one optional argument 
> which limits the amount of data (the number of bytes or characters) returned 
> if specified.
> 
> codecs.StreamReader.read() also has such parameter. But this is the second 
> parameter. The first parameter limits the number of bytes read for decoding. 
> read(1) can return 70 characters, that will confuse most callers which expect 
> either a single character or an empty string (at the end of stream).

That's not true. .read(1) will at most read 1 byte from the stream
and decode it. There's no way it will return 70 characters. It will
usually return less chars than the number of bytes read.

The reasoning here is the same as for .read() on regular byte
streams in Python 2.x: the first argument size tells the reader how
many bytes to read for decoding, since this is needed to properly
work together with .seek().

The optional second parameter chars was added as convenience,
since the user may not know how many bytes need to be read in
order to decode a certain number of characters.

That said, I see in your patch that you want to bind chars
to size. That will work and also protect the user from the
unlikely case where the codec returns more chars than bytes
read.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue31744] Python 2.7.14 Fails to compile on CentOS/RHEL7

2017-10-10 Thread M.-A. Lemburg
I'm not sure whether this is related, but your quoting for --rpath
doesn't appear to work:

On 10.10.2017 14:17, Brian Sidebotham wrote:
> LDFLAGS='-Wl,-rpath=$\\$$ORIGIN/../lib' \
> ...
> gcc -pthread -Wl,-rpath=RIGIN/../lib -fprofile-generate  -Xlinker 
> -export-dynamic -o python \
>   Modules/python.o \
>   -L. -lpython2.7 -lpthread -ldl  -lutil   -lm  

The CONFIG_ARGS variable should always be set, so I assume that
your _sysconfigdata.py was generated in a previous broken build.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue31530] [2.7] Python 2.7 readahead feature of file objects is not thread safe

2017-09-20 Thread M.-A. Lemburg
Why not simply document the fact that read ahead in Python 2.7
is not thread-safe and leave it at that ?

.next() and .readline() already don't work well together, so this
would just add one more case.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue29585] site.py imports relatively large `sysconfig` module.

2017-02-17 Thread M.-A. Lemburg
On 17.02.2017 13:06, STINNER Victor wrote:
>> Alternatively, sysconfig data could be made available via a C lookup 
>> function; with the complete dictionary only being created on demand. 
>> get_config_var() already is such a lookup API which could be used as 
>> front-end.
> 
> I don't think that it's worth it to reimplement partially sysconfig in
> C. This module is huge, complex, and platform dependant.

Sorry, I was just referring to the data part of sysconfig,
not sysconfig itself.

Having a lookup function much like we have for unicodedata
makes things much more manageable, since you don't need to
generate a dictionary in memory for all the values in the
config data. Creating that dictionary takes a while (in terms
of ms).

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue29410] Moving to SipHash-1-3

2017-02-01 Thread M.-A. Lemburg
On 01.02.2017 10:14, Christian Heimes wrote:
> 
> PEP 456 defines an API to add more hashing algorithms and make the selection 
> of hash algorithm a compile time option. We can easily add SipHash-1-3 and 
> make it the default algorithm. Vendors then can select between FNV2, 
> SipHash-1-3 and SipHash-2-4.

+1 on adding the 1-3 and making it the default; the faster
the better. Hash speed for strings needs to be excellent in Python
due to the many dict lookups we use in the interpreter.

Reading up a bit on the Rust thread and looking at this benchmark
which is mentioned in the thread:

https://imgur.com/5dKecOW

it seems as if it would make sense to not use a fixed
hash algorithm for all strings lengths, but instead a
hybrid one to increase performance for short strings
(which are used a lot in Python).

Is there a good hash algorithm with provides better
performance for short strings than siphash ?

> On another note should we add SipHash-2-4 and 1-3 PRF to the hashlib mode?

+1 as well.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue15369] pybench and test.pystone poorly documented

2016-09-15 Thread M.-A. Lemburg
On 15.09.2016 11:11, STINNER Victor wrote:
> 
> STINNER Victor added the comment:
> 
> Hum, since the discussion restarted, I reopen the issue ...
> 
> "Well, pybench is not just one benchmark, it's a whole collection of 
> benchmarks for various different aspects of the CPython VM and per concept it 
> tries to calibrate itself per benchmark, since each benchmark has different 
> overhead."
> 
> In the performance module, you now get individual timing for each pybench 
> benchmark, instead of an overall total which was less useful.

pybench had the same intention. It was a design mistake to add an
overall timing to each suite run. The original intention was to
compare each benchmark individually.

Perhaps it would make sense to try to port the individual benchmark
tests in pybench to performance.

> "The number of iterations per benchmark will not change between runs, since 
> this number is fixed in each benchmark."
> 
> Please take a look at the new performance module, it has a different design. 
> Calibration is based on minimum time per sample, no more on hardcoded things. 
> I modified all benchmarks, not only pybench.

I think we are talking about different things here: calibration is
pybench means that you try to determine the overhead of the
outer loop and possible setup code that is needed to run the
the test.

pybench runs a calibration method which has the same
code as the main test, but without the actual operations that you
want to test, in order to determine the timing of the overhead.

It then takes the minimum timing from overhead runs and uses
this as base line for the actual test runs (it subtracts the
overhead timing from the test run results).

This may not be ideal in all cases, but it's the closest
I could get to timing of the test operations at the time.

I'll have a look at what performance does.

> "BTW: Why would you want to run benchmarks in child processes and in parallel 
> ?"
> 
> Child processes are run sequentially.

Ah, ok.

> Running benchmarks in multiple processes help to get more reliable 
> benchmarks. Read my article if you want to learn more about the design of my 
> perf module:
> http://haypo-notes.readthedocs.io/microbenchmark.html#my-articles

Will do, thanks.

> "Ideally, the pybench process should be the only CPU intense work load on the 
> entire CPU to get reasonable results."
> 
> The perf module automatically uses isolated CPU. It strongly suggests to use 
> this amazing Linux feature to run benchmarks!
> https://haypo.github.io/journey-to-stable-benchmark-system.html
> 
> I started to write advices to get stable benchmarks:
> https://github.com/python/performance#how-to-get-stable-benchmarks
> 
> Note: See also the https://mail.python.org/mailman/listinfo/speed mailing 
> list ;-)

I've read some of your blog posts and articles on the subject
and your journey. Interesting stuff, definitely. Benchmarking
these days appears to have gotten harder not simpler compared to
the days of pybench some 19 years ago.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue26839] Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()

2016-06-07 Thread M.-A. Lemburg
On 07.06.2016 22:27, Theodore Tso wrote:
> 
> Secondly, when I decided to add this behavior to getrandom(2), it was because 
> people were really worried that people would be using /dev/urandom for 
> security-critical things (e.g., initializing ssh host session keys, when 
> they'd _really_ rather not the NSA have be able to trivally pwn the server) 
> before it had been completely initialized.   (And if it is not completely 
> initialized, it would be trivially and embarassingly easy.  See 
> https://factorable.net/weakkeys12.extended.pdf for an example of where this 
> was rather disastrous.)

Thanks, Theodore, for this paper reference. It provides convincing
arguments that going back to the Python 3.4 behavior is indeed not
a good idea - even though I'm still not convinced that the main
use case for os.urandom() is cryptography. Most people will
simply use it to seed their Mersenne Twisters, like the random
module does too.

Now, raising an exception instead of blocking would likely cause
even more breakage, so I'm with Colm in keeping Victor's patch
and applying the fix to not block in dev_urandom_noraise().

We still need to fix the random module issue, though.

For 3.6, I wish we could have the getrandom() API exposed as
os.getrandom(), with all options available to applications.
That way, the application can decide what is best for them.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue21955] ceval.c: implement fast path for integers with a single digit

2016-02-05 Thread M.-A. Lemburg
On 05.02.2016 16:14, STINNER Victor wrote:
> 
> Please don't. I would like to have time to benchmark all these patches (there 
> are now 9 patches attached to the issue :-)) and I would like to hear 
> Serhiy's feedback on your latest patches.

Regardless of the performance, the fastint5.patch looks like the
least invasive approach to me. It also doesn't incur as much
maintenance overhead as the others do.

I'd only rename the macro MAYBE_DISPATCH_FAST_NUM_OP to
TRY_FAST_NUMOP_DISPATCH :-)

BTW: I do wonder why this approach is as fast as the others. Have
compilers grown smart enough to realize that the number slot
functions will not change and can thus be inlined ?

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue21955] ceval.c: implement fast path for integers with a single digit

2016-02-03 Thread M.-A. Lemburg
On 03.02.2016 18:05, STINNER Victor wrote:
> 
>> python -m timeit  "sum([x * x * 1 for x in range(100)])"
> 
> If you only want to benchmark x*y, x+y and list-comprehension, you
> should use a tuple for the iterator.

... and precalculate that in the setup:

python -m timeit -s "loops=tuple(range(100))" "sum([x * x * 1 for x in loops])"

# python -m timeit "sum([x * x * 1 for x in range(100)])"
10 loops, best of 3: 5.74 usec per loop
# python -m timeit -s "loops=tuple(range(100))" "sum([x * x * 1 for x in 
loops])"
10 loops, best of 3: 5.56 usec per loop

(python = Python 2.7)

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue22798] time.mktime doesn't update time.tzname

2015-09-29 Thread M.-A. Lemburg
On 29.09.2015 11:31, Akira Li wrote:
> 
> Akira Li added the comment:
> 
>> Would issue22798.diff patch address your issue?
> 
> No. The issue is that C mktime() may update C tzname on some platforms
> but time.mktime() does not update time.tzname on these platforms while 
> the time module docs suggest that it might be expected e.g.:
> 
>   Most of the functions defined in this module call platform C library
>   functions with the same name. It may sometimes be helpful to consult
>   the platform documentation, because the semantics of these functions 
>   varies among platforms.

tzname is set when the module is being loaded and not updated
afterwards (unless you call tzset()). I can't really see why you
would expect a module global in Python to follow the semantics
of a C global, unless this is explicitly documented.

Note: The fact that tzset() does update the module globals is
not documented.

> ---
> 
> Unrelated: time.strftime('%Z') and
> time.strftime('%Z', time.localtime(time.time())) can differ on some 
> platforms and python versions
>   
> http://stackoverflow.com/questions/32353015/python-time-strftime-z-is-always-zero-instead-of-timezone-offset

The StackOverflow discussion targets %z (lower case z),
not %Z (with capital Z).

I think this is just a documentation bug.

Overall, I think that relying on those module globals is
not a good idea. They are not threadsafe and their values
can only be trusted right after module import. It's much
safer to access the resp. values by using struct_time values
or strftime().

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue24872] Add /NODEFAULTLIB:MSVCRT to _msvccompiler

2015-08-17 Thread M.-A. Lemburg
On 15.08.2015 22:41, Steve Dower wrote:
 
 Marc-Andre: there are a few concerns with including DLLs that aren't new with 
 any of the 3.5 changes.
 
 * depending on another CRT version is fine *if* it is available (users may 
 have to be told/helped to install the redistributable themselves)
 * CRT state will not be shared between versions. This is most obviously a 
 problem if file descriptors are shared, but may also appear when writing 
 console output, doing floating-point calculations, threading, and memory 
 management.
 * potentially many more issues if C++ is used, but since Python doesn't use 
 C++ this is mainly a concern where you have two DLLs using C++ and different 
 runtimes (the CRT is partially/fully implemented in C++, so issues may 
 theoretically occur with only one DLL using C++, but I'm yet to see any in 
 practice or even identify any specific issues - maybe it's fine? I'm not 
 going to guarantee it myself)

These issues have always existed in the past, but were never a real
problem, AFAIK, since the libraries intended to be used externally
will typically come with e.g. memory management APIs to make sure
they retain ownership of the allocated memory on their heap.

It is quite natural to have to run VCredist as part of an application
installer to make sure that the target system has the right VC runtime
DLLs installed (and the installer will do the checking).

The purpose of having DLLs for the runtime is to reduce overall
size of the components as well as being able to easily address
bugs and security issues in the runtime DLLs *without* having
to recompile and redeploy all components using them.

By forcing or even suggesting statically compiled Python C extensions,
we would break this goal and potentially put our users at risk.

IMO, we should follow the MS recommendations for Deployment in Visual C++
as we did in the past:

https://msdn.microsoft.com/en-us/library/dd293574.aspx


You can statically link a Visual C++ library to an application—that is, compile 
it into the
application—so that you don't have to deploy the Visual C++ library files 
separately. However, we
caution against this approach because statically linked libraries cannot be 
updated in place. If you
use static linking and you want to update a linked library, you have to 
recompile and redeploy your
application.


Perhaps I'm missing something, but if the only advantage of statically
compiling in the runtime is to have users not need to run VCredist
at install time, it's not worth all the added trouble this introduces.

If you are trying to make it possible to compile extensions with
compilers following VC2015, then I also don't think this approach
will work: the new compilers will use a new runtime and so
issues you describe above come into play between the extensions
and the interpreter.

In that scenario, they will create real problems, as far as I
understand, the since the Python C API expects to be able to e.g.
share FDs, internal state such as which locale to assume, or
use and free memory allocated by either the interpreter or the
extension in the resp. other component (e.g. PyArg_ParseTuple()).

So in the end, you'll still have to use the same compiler for
extensions as the one used for compiling CPython to make sure
you don't run into these issues - which is essentially the same
situation as for Python =3.4.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue24534] disable executing code in .pth files

2015-06-30 Thread M.-A. Lemburg
On 01.07.2015 00:16, Min RK wrote:
 
 Just because a feature can be misused doesn't make it a bad feature.
 
 That's fair. I'm just not aware of any uses of this feature that aren't 
 misuses, hence the patch.

I don't remember the details of why this feature was added,
but can imagine that it was supposed to enable installation
of new importers via .pth files.

Without this feature it would not be possible to add entries to
sys.path via .pth files which can only be handled by non-standard
importers.

 Perhaps you could submit a fix for this to the setuptools maintainers 
 instead.
 
 Yes, that's definite the right thing to do, and in fact the first thing I 
 did. It looks like that patch is likely to be merged; it is certainly much 
 less disruptive. That's where I started, then I decided to bring it up to 
 Python itself after reading up on the exploited feature, as it seemed to me 
 like a feature with no use other than misuse.

Thanks, that's certainly a good way forward :-)

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue23857] Make default HTTPS certificate verification setting configurable via global ini file

2015-04-05 Thread M.-A. Lemburg
On 05.04.2015 22:49, Donald Stufft wrote:
 
 Donald Stufft added the comment:
 
 I don't consider monkey patching a proper way to configure a Python
 installation.
 
 The point is that that TLS validation on/off isn't conceptually a Python level
 configuration option, that's going to be a per application configuration
 option. The monkeypatching is simply an escape hatch to give people time to
 update their applications (or pressure whoever wrote the application) to
 support the configuration option that really belongs at the application
 level. It *should* feel improper because the entire concept of a Python level
 on/off switch isn't proper and making it feel more proper by adding an 
 official
 API or config file for doing it is only giving footguns out to people.

People upgrading to a new patch level Python release will *not*
expect or want to have to change their application to adapt to
it. That's simply not within the scope of a patch level release.

Furthermore, old applications such as Zope will (most likely) not
receive such updates.

Please accept that there's a whole universe out there where people
don't continually update their applications, but still want to
benefit from bug fixes to their underlying libs and tools. The
world is full of legacy systems, regardless of whether we like it
or not. There's no good or bad about this. It's just a fact of
life.

What I'm arguing for is a way for admins of such older systems
to be able to receive bug fixes for Python 2.7.x *without*
having to change the applications.

Using an environment setting and adding that to the application's
user account settings is an easy way to resolve the issue in
situations where other options are not feasible or simply not
deemed needed (Zope has been working without any egg verification
for years).

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue23857] Make default HTTPS certificate verification setting configurable via global ini file

2015-04-05 Thread M.-A. Lemburg
FWIW: I just ran into a situation where the new approach resulted
in pip, setuptools and zc.buildout not working anymore.

This was on an AIX system which did come with CA root certificates
at all.

Now, I knew how to fix this, but the solution was not
an obvious one. I had to use truss to figure out where OpenSSL
was looking for certificates and the added the Mozilla cert
bundle from our egenix-pyopenssl package to make things work
again.

This was on a system where Python 2.7.3 had been installed
previously. After the upgrade to Python 2.7.9 nothing worked
anymore.

Again: Please let the users decide what level of security they
want to apply. We can point users to solutions, but in the end
have to respect their own decisions. Note that staying with
Python 2.7.8 is a much worse approach than disabling the checks.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue13881] Stream encoder for zlib_codec doesn't use the incremental encoder

2015-01-15 Thread M.-A. Lemburg
On 15.01.2015 05:43, Martin Panter wrote:
 
 New patch that also fixes StreamWriter.writelines() in general for the byte 
 codecs

Could you explain this new undocumented class ?

+class _IncrementalBasedWriter(StreamWriter):
+Generic StreamWriter implementation.
+
+The _EncoderClass attribute must be set to an IncrementalEncoder
+class to use.
+
+
+def __init__(self, stream, errors='strict'):
+super().__init__(stream, errors)
+self._encoder = self._Encoder(errors)
+
+def write(self, object):
+self.stream.write(self._encoder.encode(object))
+
+def reset(self):
+self.stream.write(self._encoder.encode(final=True))
+

Note that the doc-string mentions a non-existing attribute and there
are doc-string missing for the other methods.

The purpose appears to be a StreamWriter which works with
an IncrementalEncoder. A proper name would thus be
IncrementalStreamWriter which provides an .encode()
method which adapts the signature of the incremental encoder
to the one expected for StreamWriters and Codecs.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue14014] codecs.StreamWriter.reset contract not fulfilled

2015-01-14 Thread M.-A. Lemburg
Adding a note to the documentation is fine.

The .reset() method doesn't have anything to do with the underlying
stream. It's only meant to work at the codec level and needed for
codecs that keep internal state.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue22980] C extension naming doesn't take bitness into account

2014-12-16 Thread M.-A. Lemburg
On 16.12.2014 21:28, Steve Dower wrote:
 
 Steve Dower added the comment:
 
 get_platform() will be difficult to reuse, for bootstrapping reasons
 
 ISTM that if you can't determine the value at compile time, then it doesn't 
 matter to compilation enough to need to appear in the tag.

Antoine has a point there. Together with the problems I mentioned
with non-mainstream platforms, it would be better to use a compile
time definition of the platform tag as you've chosen for Windows.

PEP 425 unfortunately ignores the mentioned problems of get_platform().

 As far as matching PEP 425 for the sake of matching it goes, I'd rather keep 
 using search path tricks than have .cp35-cp35m-win_amd64.pyd appear on 
 every single one of my extension modules. Removing the _d suffix is very 
 likely more disruptive than it's worth, especially since untagged pyds are 
 still supported but the debug tag is still necessary. 'm' is always the case 
 in Windows and is benign for a correct extension anyway, and AFAICT 'u' is 
 totally unused.

You may have misread my comment. There's no need to add the Python
tag to the extension tag, only the ABI tag (with flags) and the
platform tag to determine bitness:

spam.cp35m-win_amd64.pyd

Could you explain what replacing the _d suffix with a d ABI flag
would break ?

To be clear, this would mean:

spam.cp35dm-win_amd64.pyd

instead of

spam_d.cp35m-win_amd64.pyd

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue21772] platform.uname() not EINTR safe

2014-07-08 Thread M.-A. Lemburg
On 08.07.2014 11:40, Stefano Borini wrote:
 
 You can't use subprocess. platform is used during build. subprocess needs 
 select, but select is a compiled module and at that specific time in the 
 build process is not compiled yet.

Good point :-)

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue11322] encoding package's normalize_encoding() function is too slow

2014-06-15 Thread M.-A. Lemburg
On 15.06.2014 15:02, Mark Lawrence wrote:
 
 What's the status of this issue, as we've lived with this really slow 
 implementation for well over three years?

I guess it just needs someone to write a patch.

Note that encoding lookups are cached, so the slowness only
becomes an issue if you lookup lots of different encodings.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue21570] String being confused with datetime.datetime object.

2014-05-24 Thread M.-A. Lemburg
On 24.05.2014 15:55, Brandon wrote:
 
 Observe the following code:
 
 import MySQLdb, MySQLdb.cursors, datetime
  ... mysqlCursor is a cursor object from a connection to database from the 
 MySQLdb module ... 
 mysqlCursor.execute(SELECT NOW())
 timeRow = mysqlCursor.fetchall()
 currentDateTime = datetime.datetime.strptime(timeRow[0][NOW()], %Y-%m-%d 
 %H:%M:%S)
 
 I get the following error:
 
 TypeError: must be string, not datetime.datetime
 
 HOWEVER, when I cast timeRow[0][NOW()] to a string like: 
 str(timeRow[0][NOW()]) , it works fine.
 
 For whatever reason the Python interpreter seems to interpret the string from 
 the row of the MySQLdb cursor result as a datetime.datetime object. I have no 
 explanation for this, besides it looking like a date time in the format of 
 -mm-dd HH:MM:SS. 

It's likely that MySQLdb returns the datetime value as Python
datetime.datetime object, so not really surprising that you get
a TypeError.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com