[Python-Dev] Re: When to remove BytesWarning?
On Tue, Oct 27, 2020 at 5:23 AM Victor Stinner wrote: > > os.get_exec_path() must modify temporarily warnings filters to ignore > BytesWarning when it looks for 'PATH' (unicode) or b'PATH' (bytes) in > the 'env' dictionary which may contain unicode or bytes strings. > Modifying warnings filters impact all threads which is bad. > > I dislike having to workaround this annoying behavior for dict lookup > when -b or -bb is used. > Completely agree with you. > I'm quite sure that almost nobody uses -b or -bb when running their > test suite or to develop. I expect that nobody uses it. According to > replies, it seems like porting Python 2 code to Python 3 is the only > use case. Python 3.9 and older can be used for that, no? > I think so. But I became a bit conservative when writing this proposal. > > When can we remove it? My idea is: > > > > 3.10: Deprecate the -b option. > > Do you mean writing a message into stderr? Or just deprecate it in the > documentation? I thought document only. > > > 3.11: Make the -b option no-op. Bytes warning never emits. > > 3.12: Remove the -b option. > > There is no _need_ to raise an error when -b is used. The -t option > was kept even after the feature was removed (in Python 3.0 ?). -J > ("used by Jython" says a comment) is a second command line option > which is silently ignored. > I see. > > > BytesWarning will be deprecated in the document, but not to be removed. > > I don't see what you mean here. I dislike the idea of deprecating a > feature without scheduling its removal. I don't see the point of > deprecating it in this case. I only see that as an annoyance. > Document only deprecation is useful for readers. Readers can know "I can just ignore this.". > I'm fine with removing the exception. If you don't plan to remove it, > just leave it unchanged (not deprecated), no? > OK, my new proposal is: 3.10: Stop emitting BytesWarning for bytes == unicode case, because this is the most annoying part. 3.11: Stop emitting BytesWarning in core and stdlib. 4.0: Remove `-b` option, `sys.flags.bytes_warning`, and `BytesWarning`. Regards, -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OQOCZISWWKRFAFMZJI5GMA3SNEQ2TYIJ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: os.scandir bug in Windows?
On Mon, Oct 26, 2020, 4:06 PM Chris Angelico wrote: > On Tue, Oct 27, 2020 at 10:00 AM Greg Ewing > wrote: > > > > On 27/10/20 8:24 am, Victor Stinner wrote: > > > I would > > > rather want to kill the whole concept of "access" time in operating > > > systems (or just configure the OS to not update it anymore). I guess > > > that it's really hard to make it efficient and accurate at the same > > > time... > > > > Also it's kind of weird that just looking at data on the > > disk can change something about it. Sometimes it's an > > advantage to *not* have quantum computing! > > > > And yet, it's of incredible value to be able to ask "now, where was > that file... the one that I was looking at last week, called something > about calendars, and it had a cat picture in it". Being able to answer > that kinda depends on recording accesses one way or another, so the > weirdnesses are bound to happen. > scandir is never going to answer that. Neither is a simple blind "access" time stored in filesystem metadata. ChrisA > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/ZMNVRGZ7ZEC5EAKLUOX64R4WKHOLPF4O/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YW5NMIE2SC3RQWDMJX2DVIS3FRHNPEQM/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Pattern matching reborn: PEP 622 is dead, long live PEP 634, 635, 636
Hello, Go channels are indeed very similar to asyncio Queues, with some added features like channels being closable. (There is also special syntax in the select statement, `val, ok <- chan`, that will set the `ok` variable to false if the channel has been closed.) A larger difference, I think, is that in Go channels are used practically everywhere, more so than asyncio Queues. They are an abstraction the vast majority of Go concurrency is built upon. Building this for asyncio tasks, instead of just queues, would be much more useful in Python. Contemplating this some more, I would agree we don't need an async match. A function and some types to match on would probably be enough to get us close to a select statement in a PEP634 Python. I guess the challenge is designing these matchable types for ease of use now, and I need to study the pattern matching PEPs in more detail to be able to contribute here. On one hand, this means this problem can be solved by a third party library. On the other hand, I feel like this would be very useful so it might be worth it to have it somewhere in the stdlib asyncio namespace. Since `asyncio.wait` can yield multiple tasks in the completed set, this would probably have to be wrapped in an `async for`. On Mon, Oct 26, 2020 at 12:33 PM Gustavo Carneiro wrote: > It's true that asyncio.wait provides the tools that you need, but it's a > bit clunky to use correctly. > > Maybe would be something along the lines of: > > -- > queue1 = asyncio.Queue() > queue2 = asyncio.Queue() > ... > get1 = asyncio.create_task(queue1.get()) > get2 = asyncio.create_task(queue2.get()) > await asyncio.wait({get1, get2}, return_when=asyncio.FIRST_COMPLETED) > match [task.done() for task in (get1, get2)]: > case [True, False]: get2.cancel(); item1 = await get1; > case [False, True]: get1.cancel(); item2 = await get2; > case [True, True]: item1 = await get1; ; item2 = await get2; > -- > > If asyncio.Queue() is the equivalent of Go channels, perhaps it would be > worth designing a new API for asyncio.Queue, one that is better suited to > the match statement: > > class Queue: >async def read_wait(self) -> 'Queue': >""" >Waits until the queue has at least one item ready to read, without > actually consuming the item. >""" > > Then we could more easily use match statement with multiple queues, thus: > > -- > async def ready_queue(*queues: asyncio.Queue) -> asyncio.Queue: >""" >Take multiple queue parameters and waits for at least one of them to > have items pending to read, returning that queue. >""" >await asyncio.wait({queue.read_wait() for queue in queues}, > return_when=asyncio.FIRST_COMPLETED) >for queue in queues: > if queue.qsize() > 0: > return queue > > ... > > queue1 = asyncio.Queue() > queue2 = asyncio.Queue() > > ... > > match await ready_queue(queue1, queue2): > case queue1: item1 = queue1.get_nowait(); > case queue2: item2 = queue2.get_nowait(); > -- > > Which is less clunky, maybe?... > > The above is not 100% bug free. I think those queue.get_nowait() calls > may still end up raising QueueEmpty exceptions, in case there is another > concurrent reader for those queues. This code would need more work, most > likely. > > In any case, perhaps it's not the match statement that needs to change, > but rather asyncio API that needs to be enhanced. > > > On Sun, 25 Oct 2020 at 01:14, Nick Coghlan wrote: > >> On Sat., 24 Oct. 2020, 4:21 am Guido van Rossum, >> wrote: >> >>> On Fri, Oct 23, 2020 at 6:19 AM Tin Tvrtković >>> wrote: >>> Hi, first of all, I'm a big fan of the changes being proposed here since in my code I prefer the 'union' style of logic over the OO style. I was curious, though, if there are any plans for the match operator to support async stuff. I'm interested in the problem of waiting on multiple asyncio tasks concurrently, and having a branch of code execute depending on the task. Currently this can be done by using asyncio.wait, looping over the done set and executing an if-else chain there, but this is quite tiresome. Go has a select statement (https://tour.golang.org/concurrency/5) that looks like this: select { case <-ch1: fmt.Println("Received from ch1") case <-ch2: fmt.Println("Received from ch2") } Speaking personally, this is a Go feature I miss a lot when writing asyncio code. The syntax is similar to what's being proposed here. Although it could be a separate thing added later, async match, I guess. >>> >>> Hadn't seen this before. You could propose this as a follow-up for 3.11. >>> But aren't Go channels more like asyncio Queues? I guess we'd need way more >>> in terms of a worked-out example (using asyncio code, not Go code). >>> >> >> I think we'd also want to see how far folks get with
[Python-Dev] Re: os.scandir bug in Windows?
On Tue, Oct 27, 2020 at 10:00 AM Greg Ewing wrote: > > On 27/10/20 8:24 am, Victor Stinner wrote: > > I would > > rather want to kill the whole concept of "access" time in operating > > systems (or just configure the OS to not update it anymore). I guess > > that it's really hard to make it efficient and accurate at the same > > time... > > Also it's kind of weird that just looking at data on the > disk can change something about it. Sometimes it's an > advantage to *not* have quantum computing! > And yet, it's of incredible value to be able to ask "now, where was that file... the one that I was looking at last week, called something about calendars, and it had a cat picture in it". Being able to answer that kinda depends on recording accesses one way or another, so the weirdnesses are bound to happen. ChrisA ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZMNVRGZ7ZEC5EAKLUOX64R4WKHOLPF4O/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: os.scandir bug in Windows?
On 27/10/20 8:24 am, Victor Stinner wrote: I would rather want to kill the whole concept of "access" time in operating systems (or just configure the OS to not update it anymore). I guess that it's really hard to make it efficient and accurate at the same time... Also it's kind of weird that just looking at data on the disk can change something about it. Sometimes it's an advantage to *not* have quantum computing! -- Greg ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/GPWWYOB3EQKDLELTYTE4IWGQ726BCPSY/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: os.scandir bug in Windows?
On 10/26/20, Victor Stinner wrote: > Le lun. 19 oct. 2020 à 13:50, Steve Dower a écrit > : >> Feel free to file a bug, but we'll likely only add a vague note to the >> docs about how Windows works here rather than changing anything. > > I agree that this surprising behavior can be documented. Attempting to > provide accurate access time in os.scandir() is likely to slow-down > the function which would defeat its whole purpose. I don't think the access time (st_atime) is a significant concern. I'm concerned with the reliability of the file size (st_size) and last-write time (st_mtime) in stat() results. Developers are used to various filesystem policies on various platforms that limit when the access time gets updated, if at all. FAT32 filesystems only have an access date, and the driver in Windows fixes the access time at midnight. Updating the access time in NTFS and ReFS can be completely disabled at the system level; otherwise it's updated with a granularity of one hour if it's only the access time that would be updated. The biggest concern for me is NTFS hardlinks, for which the st_size and st_mtime in the directory entry is unreliable. When a file with multiple hardlinks is modified, the filesystem only updates the duplicated information in the directory entry of the opened link. Because the entry in the directory doesn't include the link count or even a boolean value to indicate that a file has multiple hardlinks, if you don't know whether or not there's a possibility of hardlinks, then os.stat() is required in order to reliably determine st_size and st_mtime, to the extent that reliably knowing st_mtime is possible. A general problem that affects even os.stat() is that a modified file may only be noted by setting a flag (FO_FILE_MODIFIED) in the kernel file object of the particular open. Whether it's immediately noted in the last-write time of the shared FCB (file control block) is up to filesystem policy. Starting with Windows 10 1809 (as noted in [MS-FSA]), NTFS immediately notes the modification time, so the st_mtime value from os.stat() is current. In prior versions of NTFS, and with other Microsoft filesystems such as FAT32, the last-write time is only noted when the file is flushed to disk via FlushFileBuffers (i.e. os.fsync) or when the open is closed. This means that st_size may change without also changing st_mtime. I'm using Windows 10 2004 currently, so I can't show an NTFS example, but the following shows the behavior with FAT32: f = open('spam.txt', 'w') st1 = os.stat('spam.txt') time.sleep(10) f.write('spam') f.flush() st2 = os.stat('spam.txt') The above write was noted only by setting the FO_FILE_MODIFIED flag on the kernel file object. (The file object can be inspected with a local kernel debugger.) The write time wasn't noted in the FCB, i.e. st_mtime hasn't changed in st2: >>> st2.st_size - st1.st_size 4 >>> st2.st_mtime - st1.st_mtime 0.0 The last-write time is noted when FlushFileBuffers (os.fsync) is called on the open: >>> os.fsync(f.fileno()) >>> st3 = os.stat('spam.txt') >>> st3.st_mtime - st1.st_mtime 10.0 Note also that, with NTFS, to the extent that the FCB metadata is current, calling os.stat() on a link updates the duplicated information in the directory entry. So calling os.stat() on a NTFS file may update the entry that's returned by a subsequent os.scandir() call. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LEBCSKGSL7PMAFH6AQR5LFL7UJ4T5774/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: fail keyword like there is pass keyword
`raise NotImplementedError` https://docs.python.org/3/library/exceptions.html#NotImplementedError I think would be the canonical solution. E On Mon, 26 Oct 2020 at 20:34, Victor Stinner wrote: > If you use the unittest module, I suggest you to use self.fail() instead: > it is standard. Moreover, you can specify a message. > https://docs.python.org/dev/library/unittest.html#unittest.TestCase.fail > > Victor > > Le ven. 23 oct. 2020 à 21:36, Umair Ashraf a écrit : > >> Hello >> >> Can I suggest a feature to discuss and hopefully develop and send a PR. I >> think having a *fail* keyword for unit testing would be great. So we >> write a test as follows which will fail to begin with. >> >> class MyTest(unittest.TestCase): >>def test_this_and_that(self): >> """ >> Given inputs >> When action is done >> Then it should pass >> """ >> fail >> >> This keyword is to fill an empty function block like *pass* but this >> will make the function raise an exception that test is failing. I know >> there is *raise* but I feel this *fail* keyword is needed to write a >> test first which fails and then write code and then come back to the test >> and fill its body. >> >> Umair >> >> -- >> >> ___ >> Python-Dev mailing list -- python-dev@python.org >> To unsubscribe send an email to python-dev-le...@python.org >> https://mail.python.org/mailman3/lists/python-dev.python.org/ >> Message archived at >> https://mail.python.org/archives/list/python-dev@python.org/message/QPOVO34K63CLEY66GSY5JOLWBRG5QRUM/ >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > -- > Night gathers, and now my watch begins. It shall not end until my death. > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/P5LJB2VTO5XBOAWBSQ5NYFZSFIYEZS3Q/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/I5ZATRCSV7YXPRT3ZAA3QOCJLS4L4Z5Y/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Macro for logging
Hi, There is the __debug__ builtin variable which is equal to True by default, but is equal to False when Python is run with the -O command line option. The compiler removes dead code when -O is used. Example: $ cat x.py def func(): if __debug__: print("debug") import dis dis.dis(func) # "debug" constant is checked at runtime $ python3 x.py 2 0 LOAD_GLOBAL 0 (print) 2 LOAD_CONST 1 ('debug') 4 CALL_FUNCTION1 6 POP_TOP 8 LOAD_CONST 0 (None) 10 RETURN_VALUE # code removed by the compiler $ python3 -O x.py 2 0 LOAD_CONST 0 (None) 2 RETURN_VALUE Victor Le mer. 21 oct. 2020 à 14:21, Marco Sulla a écrit : > > If not already present, do you think it's useful to add a macro that does > something like > > # ifdef Py_DEBUG > fprintf(stderr, "%s\n", message); > # endif > > ? > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/6W6YO6JSJZOGWYWNWB2ARUS4LSLY3C7Y/ > Code of Conduct: http://python.org/psf/codeofconduct/ -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5V3HNOGDF2I44CKEAYR2XILF6DE7THFL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: fail keyword like there is pass keyword
If you use the unittest module, I suggest you to use self.fail() instead: it is standard. Moreover, you can specify a message. https://docs.python.org/dev/library/unittest.html#unittest.TestCase.fail Victor Le ven. 23 oct. 2020 à 21:36, Umair Ashraf a écrit : > Hello > > Can I suggest a feature to discuss and hopefully develop and send a PR. I > think having a *fail* keyword for unit testing would be great. So we > write a test as follows which will fail to begin with. > > class MyTest(unittest.TestCase): >def test_this_and_that(self): > """ > Given inputs > When action is done > Then it should pass > """ > fail > > This keyword is to fill an empty function block like *pass* but this will > make the function raise an exception that test is failing. I know there is > *raise* but I feel this *fail* keyword is needed to write a test first > which fails and then write code and then come back to the test and fill its > body. > > Umair > > -- > > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/QPOVO34K63CLEY66GSY5JOLWBRG5QRUM/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/P5LJB2VTO5XBOAWBSQ5NYFZSFIYEZS3Q/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: When to remove BytesWarning?
Le sam. 24 oct. 2020 à 15:13, Christian Heimes a écrit : > In my experience it would be useful to keep the bytes warning for > implicit representation of bytes in string formatting. It's still a > common source of issues in code. IMO it's not a big deal to investigate such bugs without the -b / -bb command line option. It should be easy to identify where bytes are formatted as string. Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QCZ5F2BTEQAAX6GJKBQGWHXPOCQZKLAJ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: When to remove BytesWarning?
Hi, Which operations are impacted by -b and -bb? str == bytes, bytes == str, dict lookup using str or bytes keys? 'unicode' < b'bytes' always raises a TypeError. Le sam. 24 oct. 2020 à 05:20, Inada Naoki a écrit : > To avoid BytesWarning, the compiler needs to do some hack when they > need to store bytes and str constants in one dict or set. > BytesWarning has maintenance costs. It is not huge, but significant. os.get_exec_path() must modify temporarily warnings filters to ignore BytesWarning when it looks for 'PATH' (unicode) or b'PATH' (bytes) in the 'env' dictionary which may contain unicode or bytes strings. Modifying warnings filters impact all threads which is bad. I dislike having to workaround this annoying behavior for dict lookup when -b or -bb is used. I'm quite sure that almost nobody uses -b or -bb when running their test suite or to develop. I expect that nobody uses it. According to replies, it seems like porting Python 2 code to Python 3 is the only use case. Python 3.9 and older can be used for that, no? > When can we remove it? My idea is: > > 3.10: Deprecate the -b option. Do you mean writing a message into stderr? Or just deprecate it in the documentation? > 3.11: Make the -b option no-op. Bytes warning never emits. > 3.12: Remove the -b option. There is no _need_ to raise an error when -b is used. The -t option was kept even after the feature was removed (in Python 3.0 ?). -J ("used by Jython" says a comment) is a second command line option which is silently ignored. > BytesWarning will be deprecated in the document, but not to be removed. I don't see what you mean here. I dislike the idea of deprecating a feature without scheduling its removal. I don't see the point of deprecating it in this case. I only see that as an annoyance. I'm fine with removing the exception. If you don't plan to remove it, just leave it unchanged (not deprecated), no? Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YL4DITFFUYBNXF7EFZKO4IRZRDRMRIVP/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PyPy performance stats (was Re: Speeding up CPython)
On 10/26/2020 11:42 AM, Matti Picus wrote: On 10/21/20 2:38 PM, Matti Picus wrote: [0] https://speed.pypy.org/comparison/ Just as a follow up: the front page of speed.pypy.org now shows the latest pypy 3.6 vs cpython 3.6.7. I just clicked the link and there is 3.7.6, not 3.6.7. But why not current cpython? Since all executables other than pypy 3.6 are checked, no comparisons are shown, and the chart would be impossible. The default should be just two executables checked. I am not going to try to uncheck 50. -- Terry Jan Reedy ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JJ7DOSJBQ25E7C44JFVKYODYCZ7L3A6P/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: os.scandir bug in Windows?
Le lun. 19 oct. 2020 à 13:50, Steve Dower a écrit : > Feel free to file a bug, but we'll likely only add a vague note to the > docs about how Windows works here rather than changing anything. I agree that this surprising behavior can be documented. Attempting to provide accurate access time in os.scandir() is likely to slow-down the function which would defeat its whole purpose. -- By the way, who relies on the access time? I don't understand why the creation and modification times are not enough for all usages. I would rather want to kill the whole concept of "access" time in operating systems (or just configure the OS to not update it anymore). I guess that it's really hard to make it efficient and accurate at the same time... Linux has a "relatime" mount option (Fedora enables it by default): "With this option enabled, atime data is written to the disk only if the file has been modified since the atime data was last updated (mtime), or if the file was last accessed more than a certain amount of time ago (by default, one day)." Minor enhancement over always updating atime. Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VKL5VXI6R4BNN36RX2FJ5G4YEHS372UV/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: _PyBytesWriter/_PyUnicodeWriter could be faster
Hi, Le dim. 25 oct. 2020 à 15:36, Ma Lin a écrit : > Some code needs to maintain an output buffer that has an unpredictable size. > Such as bz2/lzma/zlib modules, _PyBytesWriter/_PyUnicodeWriter. > > In current code, when the output buffer grows, resizing will cause > unnecessary memcpy(). > > issue41486 uses memory blocks to represent output buffer in bz2/lzma/zlib > modules, it could eliminate the overhead of resizing. Some context. _PyBytesWriter is an internal C API designed for C functions which return a bytes or a bytearray object and use a loop writing into "ptr" (pointer into a bytes buffer). Such functions expect a single contiguous memory block. It is based on realloc() and overallocation (which can be disabled in the API). It uses a bytes object which is resized on demand. It also uses a short buffer of 512 bytes allocated on the stack memory for short strings. _PyBytesWriter_Finish() calls _PyBytes_Resize() if needed. In 2016, I wrote an article on this API: https://vstinner.github.io/pybyteswriter.html realloc() does not always imply to copy memory. Growing a memory block can sometimes be done in-place (no data copy). Same when you shrink a memory block in _PyBytesWriter_Finish(). Also, overallocation reduces the number of recall() calls. _PyBytesWriter design is optimized for short strings up to 100 bytes. -- _PyUnicodeWriter API is designed for the PEP 393 compact string structure (ASCII, Py_UCS1 latin1, Py_UCS2 and Py_UCS4 formats). It tries to reduce conversions between the 3 formats (Py_UCS1, Py_UCS2 and Py_UCS4) and also uses overallocation to reduce memory copies. -- By the way, _PyBytesWriter and _PyUnicodeWriter overallocation is different on Windows: #ifdef MS_WINDOWS /* On Windows, overallocate by 50% is the best factor */ # define OVERALLOCATE_FACTOR 2 #else /* On Linux, overallocate by 25% is the best factor */ # define OVERALLOCATE_FACTOR 4 #endif -- The internal C API _PyAccu is a variant of _PyUnicodeWriter which uses a list of short strings and sometimes concatenates these strings into a single large string. > _PyBytesWriter/_PyUnicodeWriter could use the same way. > > If write a "general blocks output buffer", it could be used in > _PyBytesWriter/bz2/lzma/zlib. (issue41486 is not very general, it uses a > bytes object to represent a memory block.) I understand that the main idea is to not use a single buffer, but use a list of buffers, and concatenate them in _BlocksOutputBuffer_Finish(). Similar idea to PyAccu API. Maybe some functions using _PyBytesWriter can be adapted to use a list of buffers rather than a single buffer. But I'm not convinced that it would make them faster. The question is which kind of functions you want to optimize, for which string length, etc. You should dig into the old issues where I optimized str%args and str.format(): * http://bugs.python.org/issue14687 : str % args * http://bugs.python.org/issue14744 : str.format() * https://bugs.python.org/issue2534 : bytes % args I used benchmarks like: https://github.com/vstinner/pymicrobench/blob/master/bench_bytes_format_int.py https://github.com/vstinner/pymicrobench/blob/master/bench_str_format.py https://github.com/vstinner/pymicrobench/blob/master/bench_str_format_keywords.py > If write a new _PyUnicodeWriter like this, it has a chance to eliminate the > overhead of switching PyUnicode_Kind (record the switching position): > > 'a' * 100_000_000 + '\uABCD' For a+b, Python first computes "a", then "b", and finally "a+b". I don't see how your API could optimize such code. For operations on strings like "%s%s" % (a, b) or "{}{}".format(a, b), Python internally uses _PyUnicodeWriter. To format "a", _PyUnicodeWriter just stores a reference to it as _PyUnicodeWriter.buffer and marks the buffer as read-only (optimization when the result is made of a single string: no copy is made at all!). To format "b", _PyUnicodeWriter_WriteStr() converts the buffer to Py_UCS2 and then writes the new string. The "a" string is only written "once", not twice. I don't see how your API would avoid copies in such cases. Moreover, str % args and str.format() are optimized to avoid over-allocation when "b" is written: the final _PyUnicodeWriter_Finish() call is free, it does nothing. > If anyone has time and is willing to try, it's very welcome. > Or I might do this at sometime in the future. I can be completely wrong, please try and show benchmarks proving that your approach is faster on specific use cases, without hurting performances on short strings ;-) Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/O3T6B3HDO24M3W5NZE2RCR7FCZTMAWV3/ Code of
[Python-Dev] Re: PyPy performance stats (was Re: Speeding up CPython)
On Tue, Oct 27, 2020 at 2:42 AM Matti Picus wrote: > > > On 10/21/20 2:38 PM, Matti Picus wrote: > > On 10/21/20 20:42:02 +1100 Chris Angelico wrote: > > > >> When I go looking for PyPy performance stats, everything seems to be > >> Python 2.7. Is there anywhere that compares PyPy3 to CPython 3.6 (or > >> whichever specific version)? Or maybe it's right there on > >> https://speed.pypy.org/ and I just can't see it - that's definitely > >> possible:) > >> > >> ChrisA > > > > > > They are not on the front page. You can find them, but it requires > > digging around in the Comparison page[0]. > > > > I guess we could switch to emphasizing python3 on the front page, help > > in updating and reconfiguring Codespeed [1] would be awesome. > > > > Matti > > > > > > [0] https://speed.pypy.org/comparison/ > > > > [1] https://github.com/python/codespeed/tree/speed.pypy.org > > > > Just as a follow up: the front page of speed.pypy.org now shows the > latest pypy 3.6 vs cpython 3.6.7. > Thank you! Good to see! ChrisA ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/E357UBOFRIIGWI64ZWOIADN65UQJAQ5K/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PyPy performance stats (was Re: Speeding up CPython)
On 10/21/20 2:38 PM, Matti Picus wrote: On 10/21/20 20:42:02 +1100 Chris Angelico wrote: When I go looking for PyPy performance stats, everything seems to be Python 2.7. Is there anywhere that compares PyPy3 to CPython 3.6 (or whichever specific version)? Or maybe it's right there on https://speed.pypy.org/ and I just can't see it - that's definitely possible:) ChrisA They are not on the front page. You can find them, but it requires digging around in the Comparison page[0]. I guess we could switch to emphasizing python3 on the front page, help in updating and reconfiguring Codespeed [1] would be awesome. Matti [0] https://speed.pypy.org/comparison/ [1] https://github.com/python/codespeed/tree/speed.pypy.org Just as a follow up: the front page of speed.pypy.org now shows the latest pypy 3.6 vs cpython 3.6.7. Matti ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2BXQUZO24SZOP2AEQTB3RQNHQWC5APJ6/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Pattern matching reborn: PEP 622 is dead, long live PEP 634, 635, 636
It's true that asyncio.wait provides the tools that you need, but it's a bit clunky to use correctly. Maybe would be something along the lines of: -- queue1 = asyncio.Queue() queue2 = asyncio.Queue() ... get1 = asyncio.create_task(queue1.get()) get2 = asyncio.create_task(queue2.get()) await asyncio.wait({get1, get2}, return_when=asyncio.FIRST_COMPLETED) match [task.done() for task in (get1, get2)]: case [True, False]: get2.cancel(); item1 = await get1; case [False, True]: get1.cancel(); item2 = await get2; case [True, True]: item1 = await get1; ; item2 = await get2; -- If asyncio.Queue() is the equivalent of Go channels, perhaps it would be worth designing a new API for asyncio.Queue, one that is better suited to the match statement: class Queue: async def read_wait(self) -> 'Queue': """ Waits until the queue has at least one item ready to read, without actually consuming the item. """ Then we could more easily use match statement with multiple queues, thus: -- async def ready_queue(*queues: asyncio.Queue) -> asyncio.Queue: """ Take multiple queue parameters and waits for at least one of them to have items pending to read, returning that queue. """ await asyncio.wait({queue.read_wait() for queue in queues}, return_when=asyncio.FIRST_COMPLETED) for queue in queues: if queue.qsize() > 0: return queue ... queue1 = asyncio.Queue() queue2 = asyncio.Queue() ... match await ready_queue(queue1, queue2): case queue1: item1 = queue1.get_nowait(); case queue2: item2 = queue2.get_nowait(); -- Which is less clunky, maybe?... The above is not 100% bug free. I think those queue.get_nowait() calls may still end up raising QueueEmpty exceptions, in case there is another concurrent reader for those queues. This code would need more work, most likely. In any case, perhaps it's not the match statement that needs to change, but rather asyncio API that needs to be enhanced. On Sun, 25 Oct 2020 at 01:14, Nick Coghlan wrote: > On Sat., 24 Oct. 2020, 4:21 am Guido van Rossum, wrote: > >> On Fri, Oct 23, 2020 at 6:19 AM Tin Tvrtković >> wrote: >> >>> Hi, >>> >>> first of all, I'm a big fan of the changes being proposed here since in >>> my code I prefer the 'union' style of logic over the OO style. >>> >>> I was curious, though, if there are any plans for the match operator to >>> support async stuff. I'm interested in the problem of waiting on multiple >>> asyncio tasks concurrently, and having a branch of code execute depending >>> on the task. >>> >>> Currently this can be done by using asyncio.wait, looping over the done >>> set and executing an if-else chain there, but this is quite tiresome. Go >>> has a select statement (https://tour.golang.org/concurrency/5) that >>> looks like this: >>> >>> select { >>> case <-ch1: >>> fmt.Println("Received from ch1") >>> case <-ch2: >>> fmt.Println("Received from ch2") >>> } >>> >>> Speaking personally, this is a Go feature I miss a lot when writing >>> asyncio code. The syntax is similar to what's being proposed here. Although >>> it could be a separate thing added later, async match, I guess. >>> >> >> Hadn't seen this before. You could propose this as a follow-up for 3.11. >> But aren't Go channels more like asyncio Queues? I guess we'd need way more >> in terms of a worked-out example (using asyncio code, not Go code). >> > > I think we'd also want to see how far folks get with using guard clauses > for this kind of "where did the data come from?" check - the only > specifically asynchronous bit would be the "await multiple tasks" > operation, and you can already tell asyncio.wait() to return on the first > completed task rather than waiting for all the results. > > Cheers, > Nick. > > > >> ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/NQWYLFLGLLCEHAXYHUOXQ3M7IOEL65ET/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BHHNI54W6PVET3RD7XVHNOHFUAEDSVS5/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: When to remove BytesWarning?
24.10.20 06:19, Inada Naoki пише: > To avoid BytesWarning, the compiler needs to do some hack when they > need to store bytes and str constants in one dict or set. > BytesWarning has maintenance costs. It is not huge, but significant. > > When can we remove it? My idea is: > > 3.10: Deprecate the -b option. > 3.11: Make the -b option no-op. Bytes warning never emits. > 3.12: Remove the -b option. > > BytesWarning will be deprecated in the document, but not to be removed. > Users who want to use the -b option during 2->3 conversion need to use > Python ~3.10 for a while. I agree that it should be removed, and that BytesWarning should be kept (maybe we will reuse it for other purposes in future). But I do not see how deprecating it before removing could help. Using it with -We will no longer work, and without -We it will just add a noise. We can just make -b a no-op at any moment and remove it few versions later. Or maybe first make it no-op, then deprecate, then remove. But it looks too much. -b is still usable in 3.9, so it can be removed not earlier than EOL of 3.9. Users that use it should be able to use it with all maintained Python versions if it makes sense with at least one of them. 3.x: Make the -b option no-op. Bytes warning never emits. 3.x+4: Remove the -b option. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6LBAYEQSWUEFKNR6LMJ35OAF2YZXAVWE/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: When to remove BytesWarning?
26.10.20 10:10, Inada Naoki пише: > Of course, there are some runtime costs too. > > https://github.com/python/cpython/blob/fb5db7ec58624cab0797b4050735be865d380823/Modules/_functoolsmodule.c#L802 > https://github.com/python/cpython/blob/fb5db7ec58624cab0797b4050735be865d380823/Objects/codeobject.c#L724 > (maybe more, but I'm not sure) It will not help much in these cases because we still need to distinguish 1 from True and 1.0 and -0.0 from 0.0. But if keys only can be str or bytes, we pay additional cost. An example is the re cache. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/S4B7KIIEB5XRPE4WKDXMSN424DJGNGBC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: When to remove BytesWarning?
On Mon, Oct 26, 2020 at 4:35 PM Senthil Kumaran wrote: > > On Sat, Oct 24, 2020 at 6:18 AM Christian Heimes wrote: >> >> In my experience it would be useful to keep the bytes warning for >> implicit representation of bytes in string formatting. It's still a >> common source of issues in code. > > I am with Christian here. Do you mean you are OK to remove BytesWarning from b"abc" == u"def" and b"abc" == 42? > Still notice a possibility of people running into this because all the > Python2 code is not dead yet. > Perhaps this warning might stay for a long time. > I never proposed to remove it "now", but 3.11. 3.10 will become security only mode at 2022-04, and EOL at 2026-10. But you can use Python 3.10 after EOL for porting Python 2 code, because security fix is not required while porting. > > BytesWarning has maintenance costs. It is not huge, but significant. > > Should we know by how much so that the proposal of `-b` switch can be > weighted against? > It is difficult to say "how much". We need to keep it in mind that `a == b` is not safe even for builtin types everytime we write a patch or review pull request. Especially, when u"foo" and b"bar" are used as keys of the same dict, BytesWarnings happens only when (randomized) hash collision. It is very hard to find this bug. Of course, there are some runtime costs too. https://github.com/python/cpython/blob/fb5db7ec58624cab0797b4050735be865d380823/Modules/_functoolsmodule.c#L802 https://github.com/python/cpython/blob/fb5db7ec58624cab0797b4050735be865d380823/Objects/codeobject.c#L724 (maybe more, but I'm not sure) Regards, -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/52WLUUIL2LS27R5UFYFICJH5OX3ETSTA/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: When to remove BytesWarning?
On Sat, Oct 24, 2020 at 6:18 AM Christian Heimes wrote: > > > In my experience it would be useful to keep the bytes warning for > implicit representation of bytes in string formatting. It's still a > common source of issues in code. > I am with Christian here. Still notice a possibility of people running into this because all the Python2 code is not dead yet. Perhaps this warning might stay for a long time. > BytesWarning has maintenance costs. It is not huge, but significant. Should we know by how much so that the proposal of `-b` switch can be weighted against? Thank you, Senthil ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Z62L5ABNQK5AYPEE6I3KTZMKEY3BC65R/ Code of Conduct: http://python.org/psf/codeofconduct/