[Python-Dev] Re: When to remove BytesWarning?

2020-10-26 Thread Inada Naoki
On Tue, Oct 27, 2020 at 5:23 AM Victor Stinner  wrote:
>
> os.get_exec_path() must modify temporarily warnings filters to ignore
> BytesWarning when it looks for 'PATH' (unicode) or b'PATH' (bytes) in
> the 'env' dictionary which may contain unicode or bytes strings.
> Modifying warnings filters impact all threads which is bad.
>
> I dislike having to workaround this annoying behavior for dict lookup
> when -b or -bb is used.
>

Completely agree with you.

> I'm quite sure that almost nobody uses -b or -bb when running their
> test suite or to develop. I expect that nobody uses it. According to
> replies, it seems like porting Python 2 code to Python 3 is the only
> use case. Python 3.9 and older can be used for that, no?
>

I think so. But I became a bit conservative when writing this proposal.


> > When can we remove it? My idea is:
> >
> > 3.10: Deprecate the -b option.
>
> Do you mean writing a message into stderr? Or just deprecate it in the
> documentation?

I thought document only.

>
> > 3.11: Make the -b option no-op. Bytes warning never emits.
> > 3.12: Remove the -b option.
>
> There is no _need_ to raise an error when -b is used. The -t option
> was kept even after the feature was removed (in Python 3.0 ?). -J
> ("used by Jython" says a comment) is a second command line option
> which is silently ignored.
>

I see.

>
> > BytesWarning will be deprecated in the document, but not to be removed.
>
> I don't see what you mean here. I dislike the idea of deprecating a
> feature without scheduling its removal. I don't see the point of
> deprecating it in this case. I only see that as an annoyance.
>

Document only deprecation is useful for readers. Readers can know "I
can just ignore this.".


> I'm fine with removing the exception. If you don't plan to remove it,
> just leave it unchanged (not deprecated), no?
>

OK, my new proposal is:

3.10: Stop emitting BytesWarning for bytes == unicode case, because
this is the most annoying part.
3.11: Stop emitting BytesWarning in core and stdlib.
4.0: Remove `-b` option, `sys.flags.bytes_warning`, and `BytesWarning`.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OQOCZISWWKRFAFMZJI5GMA3SNEQ2TYIJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-26 Thread Gregory P. Smith
On Mon, Oct 26, 2020, 4:06 PM Chris Angelico  wrote:

> On Tue, Oct 27, 2020 at 10:00 AM Greg Ewing 
> wrote:
> >
> > On 27/10/20 8:24 am, Victor Stinner wrote:
> > > I would
> > > rather want to kill the whole concept of "access" time in operating
> > > systems (or just configure the OS to not update it anymore). I guess
> > > that it's really hard to make it efficient and accurate at the same
> > > time...
> >
> > Also it's kind of weird that just looking at data on the
> > disk can change something about it. Sometimes it's an
> > advantage to *not* have quantum computing!
> >
>
> And yet, it's of incredible value to be able to ask "now, where was
> that file... the one that I was looking at last week, called something
> about calendars, and it had a cat picture in it". Being able to answer
> that kinda depends on recording accesses one way or another, so the
> weirdnesses are bound to happen.
>

scandir is never going to answer that. Neither is a simple blind "access"
time stored in filesystem metadata.

ChrisA
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/ZMNVRGZ7ZEC5EAKLUOX64R4WKHOLPF4O/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YW5NMIE2SC3RQWDMJX2DVIS3FRHNPEQM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Pattern matching reborn: PEP 622 is dead, long live PEP 634, 635, 636

2020-10-26 Thread Tin Tvrtković
Hello,

Go channels are indeed very similar to asyncio Queues, with some added
features like channels being closable. (There is also special syntax in the
select statement, `val, ok <- chan`, that will set the `ok` variable to
false if the channel has been closed.) A larger difference, I think, is
that in Go channels are used practically everywhere, more so than asyncio
Queues. They are an abstraction the vast majority of Go concurrency is
built upon.

Building this for asyncio tasks, instead of just queues, would be much more
useful in Python.

Contemplating this some more, I would agree we don't need an async match. A
function and some types to match on would probably be enough to get us
close to a select statement in a PEP634 Python. I guess the challenge is
designing these matchable types for ease of use now, and I need to study
the pattern matching PEPs in more detail to be able to contribute here.

On one hand, this means this problem can be solved by a third party
library. On the other hand, I feel like this would be very useful so it
might be worth it to have it somewhere in the stdlib asyncio namespace.

Since `asyncio.wait` can yield multiple tasks in the completed set, this
would probably have to be wrapped in an `async for`.



On Mon, Oct 26, 2020 at 12:33 PM Gustavo Carneiro 
wrote:

> It's true that asyncio.wait provides the tools that you need, but it's a
> bit clunky to use correctly.
>
> Maybe would be something along the lines of:
>
> --
> queue1 = asyncio.Queue()
> queue2 = asyncio.Queue()
> ...
> get1 = asyncio.create_task(queue1.get())
> get2 = asyncio.create_task(queue2.get())
> await asyncio.wait({get1, get2}, return_when=asyncio.FIRST_COMPLETED)
> match [task.done() for task in (get1, get2)]:
> case [True, False]:  get2.cancel(); item1 = await get1; 
> case [False, True]:  get1.cancel(); item2 = await get2; 
> case [True, True]:  item1 = await get1; ; item2 = await get2; 
> --
>
> If asyncio.Queue() is the equivalent of Go channels, perhaps it would be
> worth designing a new API for asyncio.Queue, one that is better suited to
> the match statement:
>
> class Queue:
>async def read_wait(self) -> 'Queue':
>"""
>Waits until the queue has at least one item ready to read, without
> actually consuming the item.
>"""
>
> Then we could more easily use match statement with multiple queues, thus:
>
> --
> async def ready_queue(*queues: asyncio.Queue) -> asyncio.Queue:
>"""
>Take multiple queue parameters and waits for at least one of them to
> have items pending to read, returning that queue.
>"""
>await asyncio.wait({queue.read_wait() for queue in queues},
> return_when=asyncio.FIRST_COMPLETED)
>for queue in queues:
>   if queue.qsize() > 0:
>   return queue
>
> ...
>
> queue1 = asyncio.Queue()
> queue2 = asyncio.Queue()
>
> ...
>
> match await ready_queue(queue1, queue2):
> case queue1:  item1 = queue1.get_nowait(); 
> case queue2:  item2 = queue2.get_nowait(); 
> --
>
> Which is less clunky, maybe?...
>
> The above is not 100% bug free.  I think those queue.get_nowait() calls
> may still end up raising QueueEmpty exceptions, in case there is another
> concurrent reader for those queues.  This code would need more work, most
> likely.
>
> In any case, perhaps it's not the match statement that needs to change,
> but rather asyncio API that needs to be enhanced.
>
>
> On Sun, 25 Oct 2020 at 01:14, Nick Coghlan  wrote:
>
>> On Sat., 24 Oct. 2020, 4:21 am Guido van Rossum, 
>> wrote:
>>
>>> On Fri, Oct 23, 2020 at 6:19 AM Tin Tvrtković 
>>> wrote:
>>>
 Hi,

 first of all, I'm a big fan of the changes being proposed here since in
 my code I prefer the 'union' style of logic over the OO style.

 I was curious, though, if there are any plans for the match operator to
 support async stuff. I'm interested in the problem of waiting on multiple
 asyncio tasks concurrently, and having a branch of code execute depending
 on the task.

 Currently this can be done by using asyncio.wait, looping over the
 done set and executing an if-else chain there, but this is quite tiresome.
 Go has a select statement (https://tour.golang.org/concurrency/5) that
 looks like this:

 select {
 case <-ch1:
 fmt.Println("Received from ch1")
 case <-ch2:
 fmt.Println("Received from ch2")
 }

 Speaking personally, this is a Go feature I miss a lot when writing
 asyncio code. The syntax is similar to what's being proposed here. Although
 it could be a separate thing added later, async match, I guess.

>>>
>>> Hadn't seen this before. You could propose this as a follow-up for 3.11.
>>> But aren't Go channels more like asyncio Queues? I guess we'd need way more
>>> in terms of a worked-out example (using asyncio code, not Go code).
>>>
>>
>> I think we'd also want to see how far folks get with 

[Python-Dev] Re: os.scandir bug in Windows?

2020-10-26 Thread Chris Angelico
On Tue, Oct 27, 2020 at 10:00 AM Greg Ewing  wrote:
>
> On 27/10/20 8:24 am, Victor Stinner wrote:
> > I would
> > rather want to kill the whole concept of "access" time in operating
> > systems (or just configure the OS to not update it anymore). I guess
> > that it's really hard to make it efficient and accurate at the same
> > time...
>
> Also it's kind of weird that just looking at data on the
> disk can change something about it. Sometimes it's an
> advantage to *not* have quantum computing!
>

And yet, it's of incredible value to be able to ask "now, where was
that file... the one that I was looking at last week, called something
about calendars, and it had a cat picture in it". Being able to answer
that kinda depends on recording accesses one way or another, so the
weirdnesses are bound to happen.

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZMNVRGZ7ZEC5EAKLUOX64R4WKHOLPF4O/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-26 Thread Greg Ewing

On 27/10/20 8:24 am, Victor Stinner wrote:

I would
rather want to kill the whole concept of "access" time in operating
systems (or just configure the OS to not update it anymore). I guess
that it's really hard to make it efficient and accurate at the same
time...


Also it's kind of weird that just looking at data on the
disk can change something about it. Sometimes it's an
advantage to *not* have quantum computing!

--
Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GPWWYOB3EQKDLELTYTE4IWGQ726BCPSY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-26 Thread Eryk Sun
On 10/26/20, Victor Stinner  wrote:
> Le lun. 19 oct. 2020 à 13:50, Steve Dower  a écrit
> :
>> Feel free to file a bug, but we'll likely only add a vague note to the
>> docs about how Windows works here rather than changing anything.
>
> I agree that this surprising behavior can be documented. Attempting to
> provide accurate access time in os.scandir() is likely to slow-down
> the function which would defeat its whole purpose.

I don't think the access time (st_atime) is a significant concern. I'm
concerned with the reliability of the file size (st_size) and
last-write time (st_mtime) in stat() results. Developers are used to
various filesystem policies on various platforms that limit when the
access time gets updated, if at all. FAT32 filesystems only have an
access date, and the driver in Windows fixes the access time at
midnight. Updating the access time in NTFS and ReFS can be completely
disabled at the system level; otherwise it's updated with a
granularity of one hour if it's only the access time that would be
updated.

The biggest concern for me is NTFS hardlinks, for which the st_size
and st_mtime in the directory entry is unreliable. When a file with
multiple hardlinks is modified, the filesystem only updates the
duplicated information in the directory entry of the opened link.
Because the entry in the directory doesn't include the link count or
even a boolean value to indicate that a file has multiple hardlinks,
if you don't know whether or not there's a possibility of hardlinks,
then os.stat() is required in order to reliably determine st_size and
st_mtime, to the extent that reliably knowing st_mtime is possible.

A general problem that affects even os.stat() is that a modified file
may only be noted by setting a flag (FO_FILE_MODIFIED) in the kernel
file object of the particular open. Whether it's immediately noted in
the last-write time of the shared FCB (file control block) is up to
filesystem policy.

Starting with Windows 10 1809 (as noted in [MS-FSA]), NTFS immediately
notes the modification time, so the st_mtime value from os.stat() is
current. In prior versions of NTFS, and with other Microsoft
filesystems such as FAT32, the last-write time is only noted when the
file is flushed to disk via FlushFileBuffers (i.e. os.fsync) or when
the open is closed.

This means that st_size may change without also changing st_mtime. I'm
using Windows 10 2004 currently, so I can't show an NTFS example, but
the following shows the behavior with FAT32:

f = open('spam.txt', 'w')
st1 = os.stat('spam.txt')
time.sleep(10)
f.write('spam')
f.flush()
st2 = os.stat('spam.txt')

The above write was noted only by setting the FO_FILE_MODIFIED flag on
the kernel file object. (The file object can be inspected with a local
kernel debugger.) The write time wasn't noted in the FCB, i.e.
st_mtime hasn't changed in st2:

>>> st2.st_size - st1.st_size
4
>>> st2.st_mtime - st1.st_mtime
0.0

The last-write time is noted when FlushFileBuffers (os.fsync) is
called on the open:

>>> os.fsync(f.fileno())
>>> st3 = os.stat('spam.txt')
>>> st3.st_mtime - st1.st_mtime
10.0

Note also that, with NTFS, to the extent that the FCB metadata is
current, calling os.stat() on a link updates the duplicated
information in the directory entry. So calling os.stat() on a NTFS
file may update the entry that's returned by a subsequent os.scandir()
call.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LEBCSKGSL7PMAFH6AQR5LFL7UJ4T5774/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: fail keyword like there is pass keyword

2020-10-26 Thread Evpok Padding
`raise NotImplementedError`
https://docs.python.org/3/library/exceptions.html#NotImplementedError I
think would be the canonical solution.

E

On Mon, 26 Oct 2020 at 20:34, Victor Stinner  wrote:

> If you use the unittest module, I suggest you to use self.fail() instead:
> it is standard. Moreover, you can specify a message.
> https://docs.python.org/dev/library/unittest.html#unittest.TestCase.fail
>
> Victor
>
> Le ven. 23 oct. 2020 à 21:36, Umair Ashraf  a écrit :
>
>> Hello
>>
>> Can I suggest a feature to discuss and hopefully develop and send a PR. I
>> think having a *fail* keyword for unit testing would be great. So we
>> write a test as follows which will fail to begin with.
>>
>> class MyTest(unittest.TestCase):
>>def test_this_and_that(self):
>>   """
>>   Given inputs
>>   When action is done
>>   Then it should pass
>>   """
>>   fail
>>
>> This keyword is to fill an empty function block like *pass* but this
>> will make the function raise an exception that test is failing. I know
>> there is *raise* but I feel this *fail* keyword is needed to write a
>> test first which fails and then write code and then come back to the test
>> and fill its body.
>>
>> Umair
>>
>> --
>>
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/QPOVO34K63CLEY66GSY5JOLWBRG5QRUM/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> Night gathers, and now my watch begins. It shall not end until my death.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/P5LJB2VTO5XBOAWBSQ5NYFZSFIYEZS3Q/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/I5ZATRCSV7YXPRT3ZAA3QOCJLS4L4Z5Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Macro for logging

2020-10-26 Thread Victor Stinner
Hi,

There is the __debug__ builtin variable which is equal to True by
default, but is equal to False when Python is run with the -O command
line option.

The compiler removes dead code when -O is used. Example:

$ cat x.py
def func():
if __debug__: print("debug")

import dis
dis.dis(func)

# "debug" constant is checked at runtime
$ python3 x.py
  2   0 LOAD_GLOBAL  0 (print)
  2 LOAD_CONST   1 ('debug')
  4 CALL_FUNCTION1
  6 POP_TOP
  8 LOAD_CONST   0 (None)
 10 RETURN_VALUE

# code removed by the compiler
$ python3 -O x.py
  2   0 LOAD_CONST   0 (None)
  2 RETURN_VALUE

Victor

Le mer. 21 oct. 2020 à 14:21, Marco Sulla
 a écrit :
>
> If not already present, do you think it's useful to add a macro that does 
> something like
>
> # ifdef Py_DEBUG
> fprintf(stderr, "%s\n", message);
> # endif
>
> ?
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/6W6YO6JSJZOGWYWNWB2ARUS4LSLY3C7Y/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5V3HNOGDF2I44CKEAYR2XILF6DE7THFL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: fail keyword like there is pass keyword

2020-10-26 Thread Victor Stinner
If you use the unittest module, I suggest you to use self.fail() instead:
it is standard. Moreover, you can specify a message.
https://docs.python.org/dev/library/unittest.html#unittest.TestCase.fail

Victor

Le ven. 23 oct. 2020 à 21:36, Umair Ashraf  a écrit :

> Hello
>
> Can I suggest a feature to discuss and hopefully develop and send a PR. I
> think having a *fail* keyword for unit testing would be great. So we
> write a test as follows which will fail to begin with.
>
> class MyTest(unittest.TestCase):
>def test_this_and_that(self):
>   """
>   Given inputs
>   When action is done
>   Then it should pass
>   """
>   fail
>
> This keyword is to fill an empty function block like *pass* but this will
> make the function raise an exception that test is failing. I know there is
> *raise* but I feel this *fail* keyword is needed to write a test first
> which fails and then write code and then come back to the test and fill its
> body.
>
> Umair
>
> --
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/QPOVO34K63CLEY66GSY5JOLWBRG5QRUM/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/P5LJB2VTO5XBOAWBSQ5NYFZSFIYEZS3Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When to remove BytesWarning?

2020-10-26 Thread Victor Stinner
Le sam. 24 oct. 2020 à 15:13, Christian Heimes  a écrit :
> In my experience it would be useful to keep the bytes warning for
> implicit representation of bytes in string formatting. It's still a
> common source of issues in code.

IMO it's not a big deal to investigate such bugs without the -b / -bb
command line option. It should be easy to identify where bytes are
formatted as string.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QCZ5F2BTEQAAX6GJKBQGWHXPOCQZKLAJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When to remove BytesWarning?

2020-10-26 Thread Victor Stinner
Hi,

Which operations are impacted by -b and -bb? str == bytes, bytes ==
str, dict lookup using str or bytes keys? 'unicode' < b'bytes' always
raises a TypeError.

Le sam. 24 oct. 2020 à 05:20, Inada Naoki  a écrit :
> To avoid BytesWarning, the compiler needs to do some hack when they
> need to store bytes and str constants in one dict or set.
> BytesWarning has maintenance costs. It is not huge, but significant.

os.get_exec_path() must modify temporarily warnings filters to ignore
BytesWarning when it looks for 'PATH' (unicode) or b'PATH' (bytes) in
the 'env' dictionary which may contain unicode or bytes strings.
Modifying warnings filters impact all threads which is bad.

I dislike having to workaround this annoying behavior for dict lookup
when -b or -bb is used.

I'm quite sure that almost nobody uses -b or -bb when running their
test suite or to develop. I expect that nobody uses it. According to
replies, it seems like porting Python 2 code to Python 3 is the only
use case. Python 3.9 and older can be used for that, no?

> When can we remove it? My idea is:
>
> 3.10: Deprecate the -b option.

Do you mean writing a message into stderr? Or just deprecate it in the
documentation?

> 3.11: Make the -b option no-op. Bytes warning never emits.
> 3.12: Remove the -b option.

There is no _need_ to raise an error when -b is used. The -t option
was kept even after the feature was removed (in Python 3.0 ?). -J
("used by Jython" says a comment) is a second command line option
which is silently ignored.


> BytesWarning will be deprecated in the document, but not to be removed.

I don't see what you mean here. I dislike the idea of deprecating a
feature without scheduling its removal. I don't see the point of
deprecating it in this case. I only see that as an annoyance.

I'm fine with removing the exception. If you don't plan to remove it,
just leave it unchanged (not deprecated), no?

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YL4DITFFUYBNXF7EFZKO4IRZRDRMRIVP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PyPy performance stats (was Re: Speeding up CPython)

2020-10-26 Thread Terry Reedy

On 10/26/2020 11:42 AM, Matti Picus wrote:


On 10/21/20 2:38 PM, Matti Picus wrote:



[0] https://speed.pypy.org/comparison/


Just as a follow up: the front page of speed.pypy.org now shows the 
latest pypy 3.6 vs cpython 3.6.7.


I just clicked the link and there is 3.7.6, not 3.6.7.  But why not 
current cpython?


Since all executables other than pypy 3.6 are checked, no comparisons 
are shown, and the chart would be impossible.


The default should be just two executables checked.  I am not going to 
try to uncheck 50.



--
Terry Jan Reedy
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JJ7DOSJBQ25E7C44JFVKYODYCZ7L3A6P/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-26 Thread Victor Stinner
Le lun. 19 oct. 2020 à 13:50, Steve Dower  a écrit :
> Feel free to file a bug, but we'll likely only add a vague note to the
> docs about how Windows works here rather than changing anything.

I agree that this surprising behavior can be documented. Attempting to
provide accurate access time in os.scandir() is likely to slow-down
the function which would defeat its whole purpose.

--

By the way, who relies on the access time? I don't understand why the
creation and modification times are not enough for all usages. I would
rather want to kill the whole concept of "access" time in operating
systems (or just configure the OS to not update it anymore). I guess
that it's really hard to make it efficient and accurate at the same
time...

Linux has a "relatime" mount option (Fedora enables it by default):
"With this option enabled, atime data is written to the disk only if
the file has been modified since the atime data was last updated
(mtime), or if the file was last accessed more than a certain amount
of time ago (by default, one day)." Minor enhancement over always
updating atime.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VKL5VXI6R4BNN36RX2FJ5G4YEHS372UV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: _PyBytesWriter/_PyUnicodeWriter could be faster

2020-10-26 Thread Victor Stinner
Hi,

Le dim. 25 oct. 2020 à 15:36, Ma Lin  a écrit :
> Some code needs to maintain an output buffer that has an unpredictable size. 
> Such as bz2/lzma/zlib modules, _PyBytesWriter/_PyUnicodeWriter.
>
> In current code, when the output buffer grows, resizing will cause 
> unnecessary memcpy().
>
> issue41486 uses memory blocks to represent output buffer in bz2/lzma/zlib 
> modules, it could eliminate the overhead of resizing.

Some context.

_PyBytesWriter is an internal C API designed for C functions which
return a bytes or a bytearray object and use a loop writing into "ptr"
(pointer into a bytes buffer). Such functions expect a single
contiguous memory block. It is based on realloc() and overallocation
(which can be disabled in the API). It uses a bytes object which is
resized on demand. It also uses a short buffer of 512 bytes allocated
on the stack memory for short strings. _PyBytesWriter_Finish() calls
_PyBytes_Resize() if needed.

In 2016, I wrote an article on this API:
https://vstinner.github.io/pybyteswriter.html

realloc() does not always imply to copy memory. Growing a memory block
can sometimes be done in-place (no data copy). Same when you shrink a
memory block in _PyBytesWriter_Finish(). Also, overallocation reduces
the number of recall() calls. _PyBytesWriter design is optimized for
short strings up to 100 bytes.

--

_PyUnicodeWriter API is designed for the PEP 393 compact string
structure (ASCII, Py_UCS1 latin1, Py_UCS2 and Py_UCS4 formats). It
tries to reduce conversions between the 3 formats (Py_UCS1, Py_UCS2
and Py_UCS4) and also uses overallocation to reduce memory copies.

--

By the way, _PyBytesWriter and _PyUnicodeWriter overallocation is
different on Windows:

#ifdef MS_WINDOWS
   /* On Windows, overallocate by 50% is the best factor */
#  define OVERALLOCATE_FACTOR 2
#else
   /* On Linux, overallocate by 25% is the best factor */
#  define OVERALLOCATE_FACTOR 4
#endif

--

The internal C API _PyAccu is a variant of _PyUnicodeWriter which uses
a list of short strings and sometimes concatenates these strings into
a single large string.


> _PyBytesWriter/_PyUnicodeWriter could use the same way.
>
> If write a "general blocks output buffer", it could be used in 
> _PyBytesWriter/bz2/lzma/zlib. (issue41486 is not very general, it uses a 
> bytes object to represent a memory block.)

I understand that the main idea is to not use a single buffer, but use
a list of buffers, and concatenate them in
_BlocksOutputBuffer_Finish(). Similar idea to PyAccu API.

Maybe some functions using _PyBytesWriter can be adapted to use a list
of buffers rather than a single buffer. But I'm not convinced that it
would make them faster. The question is which kind of functions you
want to optimize, for which string length, etc. You should dig into
the old issues where I optimized str%args and str.format():

* http://bugs.python.org/issue14687 : str % args
* http://bugs.python.org/issue14744 : str.format()
* https://bugs.python.org/issue2534 : bytes % args

I used benchmarks like:

https://github.com/vstinner/pymicrobench/blob/master/bench_bytes_format_int.py
https://github.com/vstinner/pymicrobench/blob/master/bench_str_format.py
https://github.com/vstinner/pymicrobench/blob/master/bench_str_format_keywords.py


> If write a new _PyUnicodeWriter like this, it has a chance to eliminate the 
> overhead of switching PyUnicode_Kind (record the switching position):
>
> 'a' * 100_000_000 + '\uABCD'

For a+b, Python first computes "a", then "b", and finally "a+b". I
don't see how your API could optimize such code.

For operations on strings like "%s%s" % (a, b) or "{}{}".format(a, b),
Python internally uses _PyUnicodeWriter. To format "a",
_PyUnicodeWriter just stores a reference to it as
_PyUnicodeWriter.buffer and marks the buffer as read-only
(optimization when the result is made of a single string: no copy is
made at all!). To format "b", _PyUnicodeWriter_WriteStr() converts the
buffer to Py_UCS2 and then writes the new string.

The "a" string is only written "once", not twice. I don't see how your
API would avoid copies in such cases.

Moreover, str % args and str.format() are optimized to avoid
over-allocation when "b" is written: the final
_PyUnicodeWriter_Finish() call is free, it does nothing.


> If anyone has time and is willing to try, it's very welcome.
> Or I might do this at sometime in the future.

I can be completely wrong, please try and show benchmarks proving that
your approach is faster on specific use cases, without hurting
performances on short strings ;-)

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/O3T6B3HDO24M3W5NZE2RCR7FCZTMAWV3/
Code of 

[Python-Dev] Re: PyPy performance stats (was Re: Speeding up CPython)

2020-10-26 Thread Chris Angelico
On Tue, Oct 27, 2020 at 2:42 AM Matti Picus  wrote:
>
>
> On 10/21/20 2:38 PM, Matti Picus wrote:
> > On 10/21/20 20:42:02 +1100 Chris Angelico wrote:
> >
> >> When I go looking for PyPy performance stats, everything seems to be
> >> Python 2.7. Is there anywhere that compares PyPy3 to CPython 3.6 (or
> >> whichever specific version)? Or maybe it's right there on
> >> https://speed.pypy.org/  and I just can't see it - that's definitely
> >> possible:)
> >>
> >> ChrisA
> >
> >
> > They are not on the front page. You can find them, but it requires
> > digging around in the Comparison page[0].
> >
> > I guess we could switch to emphasizing python3 on the front page, help
> > in updating and reconfiguring Codespeed [1] would be awesome.
> >
> > Matti
> >
> >
> > [0] https://speed.pypy.org/comparison/
> >
> > [1] https://github.com/python/codespeed/tree/speed.pypy.org
> >
>
> Just as a follow up: the front page of speed.pypy.org now shows the
> latest pypy 3.6 vs cpython 3.6.7.
>

Thank you! Good to see!

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/E357UBOFRIIGWI64ZWOIADN65UQJAQ5K/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PyPy performance stats (was Re: Speeding up CPython)

2020-10-26 Thread Matti Picus


On 10/21/20 2:38 PM, Matti Picus wrote:

On 10/21/20 20:42:02 +1100 Chris Angelico wrote:


When I go looking for PyPy performance stats, everything seems to be
Python 2.7. Is there anywhere that compares PyPy3 to CPython 3.6 (or
whichever specific version)? Or maybe it's right there on
https://speed.pypy.org/  and I just can't see it - that's definitely
possible:)

ChrisA



They are not on the front page. You can find them, but it requires 
digging around in the Comparison page[0].


I guess we could switch to emphasizing python3 on the front page, help 
in updating and reconfiguring Codespeed [1] would be awesome.


Matti


[0] https://speed.pypy.org/comparison/

[1] https://github.com/python/codespeed/tree/speed.pypy.org



Just as a follow up: the front page of speed.pypy.org now shows the 
latest pypy 3.6 vs cpython 3.6.7.


Matti
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2BXQUZO24SZOP2AEQTB3RQNHQWC5APJ6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Pattern matching reborn: PEP 622 is dead, long live PEP 634, 635, 636

2020-10-26 Thread Gustavo Carneiro
It's true that asyncio.wait provides the tools that you need, but it's a
bit clunky to use correctly.

Maybe would be something along the lines of:

--
queue1 = asyncio.Queue()
queue2 = asyncio.Queue()
...
get1 = asyncio.create_task(queue1.get())
get2 = asyncio.create_task(queue2.get())
await asyncio.wait({get1, get2}, return_when=asyncio.FIRST_COMPLETED)
match [task.done() for task in (get1, get2)]:
case [True, False]:  get2.cancel(); item1 = await get1; 
case [False, True]:  get1.cancel(); item2 = await get2; 
case [True, True]:  item1 = await get1; ; item2 = await get2; 
--

If asyncio.Queue() is the equivalent of Go channels, perhaps it would be
worth designing a new API for asyncio.Queue, one that is better suited to
the match statement:

class Queue:
   async def read_wait(self) -> 'Queue':
   """
   Waits until the queue has at least one item ready to read, without
actually consuming the item.
   """

Then we could more easily use match statement with multiple queues, thus:

--
async def ready_queue(*queues: asyncio.Queue) -> asyncio.Queue:
   """
   Take multiple queue parameters and waits for at least one of them to
have items pending to read, returning that queue.
   """
   await asyncio.wait({queue.read_wait() for queue in queues},
return_when=asyncio.FIRST_COMPLETED)
   for queue in queues:
  if queue.qsize() > 0:
  return queue

...

queue1 = asyncio.Queue()
queue2 = asyncio.Queue()

...

match await ready_queue(queue1, queue2):
case queue1:  item1 = queue1.get_nowait(); 
case queue2:  item2 = queue2.get_nowait(); 
--

Which is less clunky, maybe?...

The above is not 100% bug free.  I think those queue.get_nowait() calls may
still end up raising QueueEmpty exceptions, in case there is another
concurrent reader for those queues.  This code would need more work, most
likely.

In any case, perhaps it's not the match statement that needs to change, but
rather asyncio API that needs to be enhanced.


On Sun, 25 Oct 2020 at 01:14, Nick Coghlan  wrote:

> On Sat., 24 Oct. 2020, 4:21 am Guido van Rossum,  wrote:
>
>> On Fri, Oct 23, 2020 at 6:19 AM Tin Tvrtković 
>> wrote:
>>
>>> Hi,
>>>
>>> first of all, I'm a big fan of the changes being proposed here since in
>>> my code I prefer the 'union' style of logic over the OO style.
>>>
>>> I was curious, though, if there are any plans for the match operator to
>>> support async stuff. I'm interested in the problem of waiting on multiple
>>> asyncio tasks concurrently, and having a branch of code execute depending
>>> on the task.
>>>
>>> Currently this can be done by using asyncio.wait, looping over the done
>>> set and executing an if-else chain there, but this is quite tiresome. Go
>>> has a select statement (https://tour.golang.org/concurrency/5) that
>>> looks like this:
>>>
>>> select {
>>> case <-ch1:
>>> fmt.Println("Received from ch1")
>>> case <-ch2:
>>> fmt.Println("Received from ch2")
>>> }
>>>
>>> Speaking personally, this is a Go feature I miss a lot when writing
>>> asyncio code. The syntax is similar to what's being proposed here. Although
>>> it could be a separate thing added later, async match, I guess.
>>>
>>
>> Hadn't seen this before. You could propose this as a follow-up for 3.11.
>> But aren't Go channels more like asyncio Queues? I guess we'd need way more
>> in terms of a worked-out example (using asyncio code, not Go code).
>>
>
> I think we'd also want to see how far folks get with using guard clauses
> for this kind of "where did the data come from?" check - the only
> specifically asynchronous bit would be the "await multiple tasks"
> operation, and you can already tell asyncio.wait() to return on the first
> completed task rather than waiting for all the results.
>
> Cheers,
> Nick.
>
>
>
>> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/NQWYLFLGLLCEHAXYHUOXQ3M7IOEL65ET/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Gustavo J. A. M. Carneiro
Gambit Research
"The universe is always one step beyond logic." -- Frank Herbert
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BHHNI54W6PVET3RD7XVHNOHFUAEDSVS5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When to remove BytesWarning?

2020-10-26 Thread Serhiy Storchaka
24.10.20 06:19, Inada Naoki пише:
> To avoid BytesWarning, the compiler needs to do some hack when they
> need to store bytes and str constants in one dict or set.
> BytesWarning has maintenance costs. It is not huge, but significant.
> 
> When can we remove it? My idea is:
> 
> 3.10: Deprecate the -b option.
> 3.11: Make the -b option no-op. Bytes warning never emits.
> 3.12: Remove the -b option.
> 
> BytesWarning will be deprecated in the document, but not to be removed.
> Users who want to use the -b option during 2->3 conversion need to use
> Python ~3.10 for a while.

I agree that it should be removed, and that BytesWarning should be kept
(maybe we will reuse it for other purposes in future).

But I do not see how deprecating it before removing could help. Using it
with -We will no longer work, and without -We it will just add a noise.

We can just make -b a no-op at any moment and remove it few versions
later. Or maybe first make it no-op, then deprecate, then remove. But it
looks too much.

-b is still usable in 3.9, so it can be removed not earlier than EOL of
3.9. Users that use it should be able to use it with all maintained
Python versions if it makes sense with at least one of them.

3.x: Make the -b option no-op. Bytes warning never emits.
3.x+4: Remove the -b option.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6LBAYEQSWUEFKNR6LMJ35OAF2YZXAVWE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When to remove BytesWarning?

2020-10-26 Thread Serhiy Storchaka
26.10.20 10:10, Inada Naoki пише:
> Of course, there are some runtime costs too.
> 
> https://github.com/python/cpython/blob/fb5db7ec58624cab0797b4050735be865d380823/Modules/_functoolsmodule.c#L802
> https://github.com/python/cpython/blob/fb5db7ec58624cab0797b4050735be865d380823/Objects/codeobject.c#L724
> (maybe more, but I'm not sure)

It will not help much in these cases because we still need to
distinguish 1 from True and 1.0 and -0.0 from 0.0.

But if keys only can be str or bytes, we pay additional cost. An example
is the re cache.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/S4B7KIIEB5XRPE4WKDXMSN424DJGNGBC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When to remove BytesWarning?

2020-10-26 Thread Inada Naoki
On Mon, Oct 26, 2020 at 4:35 PM Senthil Kumaran  wrote:
>
> On Sat, Oct 24, 2020 at 6:18 AM Christian Heimes  wrote:
>>
>> In my experience it would be useful to keep the bytes warning for
>> implicit representation of bytes in string formatting. It's still a
>> common source of issues in code.
>
> I am with Christian here.

Do you mean you are OK to remove BytesWarning from b"abc" == u"def"
and b"abc" == 42?

> Still notice a possibility of people running into this because all the 
> Python2 code is not dead yet.
> Perhaps this warning might stay for a long time.
>

I never proposed to remove it "now", but 3.11.
3.10 will become security only mode at 2022-04, and EOL at 2026-10.
But you can use Python 3.10 after EOL for porting Python 2 code,
because security fix is not required while porting.

> > BytesWarning has maintenance costs. It is not huge, but significant.
>
> Should we know by how much so that the proposal of `-b` switch can be 
> weighted against?
>

It is difficult to say "how much". We need to keep it in mind that `a
== b` is not safe even for builtin types
everytime we write a patch or review pull request. Especially, when
u"foo" and b"bar" are used as keys
of the same dict, BytesWarnings happens only when (randomized) hash collision.
It is very hard to find this bug.

Of course, there are some runtime costs too.

https://github.com/python/cpython/blob/fb5db7ec58624cab0797b4050735be865d380823/Modules/_functoolsmodule.c#L802
https://github.com/python/cpython/blob/fb5db7ec58624cab0797b4050735be865d380823/Objects/codeobject.c#L724
(maybe more, but I'm not sure)

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/52WLUUIL2LS27R5UFYFICJH5OX3ETSTA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When to remove BytesWarning?

2020-10-26 Thread Senthil Kumaran
On Sat, Oct 24, 2020 at 6:18 AM Christian Heimes 
wrote:

>
>
> In my experience it would be useful to keep the bytes warning for
> implicit representation of bytes in string formatting. It's still a
> common source of issues in code.
>

I am with Christian here. Still notice a possibility of people running into
this because all the Python2 code is not dead yet.
Perhaps this warning might stay for a long time.

> BytesWarning has maintenance costs. It is not huge, but significant.

Should we know by how much so that the proposal of `-b` switch can be
weighted against?

Thank you,
Senthil
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Z62L5ABNQK5AYPEE6I3KTZMKEY3BC65R/
Code of Conduct: http://python.org/psf/codeofconduct/