[issue47149] DatagramHandler doing DNS lookup on every log message

2022-03-30 Thread Bruce Merry
Bruce Merry added the comment: > But it's going to be non-trivial, I fear. Yeah. Maybe some documentation is achievable in the short term though, so that users who care more about latency than changing DNS are aware that they should do the lookup themsel

[issue47149] DatagramHandler doing DNS lookup on every log message

2022-03-29 Thread Bruce Merry
Bruce Merry added the comment: > Hmm. I'm not sure we should try to work around a bad resolver issue. What's > your platform, and how did you install Python? Fair point. It's Ubuntu 20.04, running inside Docker, with the default Python (3.8). I've also reproduced it outside Docker

[issue47149] DatagramHandler doing DNS lookup on every log message

2022-03-29 Thread Bruce Merry
Bruce Merry added the comment: > Yes, that's what I mean. Isn't the resolver library smart enough to cache > lookups and handle the TTL timeout by itself? Apparently not in this case - with tcpdump I can see the DNS requests being fired off several times a second. I'll need to chec

[issue47149] DatagramHandler doing DNS lookup on every log message

2022-03-29 Thread Bruce Merry
Bruce Merry added the comment: > If you don’t look it up every time, how do you deal with DNS timeouts? Do you mean expiring the IP address when the TTL is reached? I suppose that could be an issue for a long-running service, and I don't have a good answer to that. Possibly these d

[issue47149] DatagramHandler doing DNS lookup on every log message

2022-03-29 Thread Bruce Merry
Change by Bruce Merry : -- type: -> performance ___ Python tracker <https://bugs.python.org/issue47149> ___ ___ Python-bugs-list mailing list Unsubscrib

[issue47149] DatagramHandler doing DNS lookup on every log message

2022-03-29 Thread Bruce Merry
New submission from Bruce Merry : logging.DatagramHandler uses socket.sendto to send the messages. If the given address is a hostname rather than an IP address, it will do a DNS lookup every time. I suspect that fixing issue 14855 will also fix this, since fixing that issue requires

[issue21644] Optimize bytearray(int) constructor to use calloc()

2021-11-14 Thread Bruce Merry
Bruce Merry added the comment: > I abandonned the issue because I didn't have time to work on it. If you want, > you can open a new issue for that. If I make a pull request and run some microbenchmarks, will you (or some other core dev) have time to review it? I've had a bad expe

[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2021-07-29 Thread Bruce Merry
Bruce Merry added the comment: > Will you accept patches to fix this for 3.9? I'm not clear whether the "bug > fixes only" status of 3.9 allows for fixing performance regressions. Never mind, I see your already answered this on bpo-42853 (as a no). Thanks for taking the t

[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2021-07-29 Thread Bruce Merry
Bruce Merry added the comment: > There is nothing to do here. Will you accept patches to fix this for 3.9? I'm not clear whether the "bug fixes only" status of 3.9 allows for fixing performance regressions. -- ___ Python tr

[issue42853] `OverflowError: signed integer is greater than maximum` in ssl.py for files larger than 2GB

2021-07-29 Thread Bruce Merry
Bruce Merry added the comment: > A patch would not land in Python 3.9 since this would be a new feature and > out-of-scope for a released version. I see it as a fix for this bug. While there is already a fix, it regresses another bug (bpo-36050), so this would be a better fix. &g

[issue42853] `OverflowError: signed integer is greater than maximum` in ssl.py for files larger than 2GB

2021-07-28 Thread Bruce Merry
Bruce Merry added the comment: > It seems like we could have support for OpenSSL 1.1.1 at that level with a > compile time fallback for previous OpenSSL versions that break up the work. > Would hope this solution also yields something we can backport more easily I'd have to look a

[issue42853] `OverflowError: signed integer is greater than maximum` in ssl.py for files larger than 2GB

2021-07-28 Thread Bruce Merry
Bruce Merry added the comment: This fix is going to cause a regression of bpo-36050. Would it not be possible to fix this in _ssl.c (by breaking a large read into multiple smaller calls to SSL_read)? It seems like fixing this at the SSL layer is more appropriate than trying to work around

[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2021-07-28 Thread Bruce Merry
Bruce Merry added the comment: Re-opening because the patch to fix this has just been reverted due to bpo-42853. -- status: closed -> open ___ Python tracker <https://bugs.python.org/issu

[issue21644] Optimize bytearray(int) constructor to use calloc()

2020-09-15 Thread Bruce Merry
Bruce Merry added the comment: Was this abandoned just because nobody had the time, or was there a problem with the approach? I independently wanted this optimisation, and have ended up implementing something very similar to what was reverted in https://hg.python.org/lookup/dff6b4b61cac

[issue32528] Change base class for futures.CancelledError

2020-07-06 Thread Bruce Merry
Bruce Merry added the comment: FYI this has just bitten me after updating my OS to one that ships Python 3.8. It is code that was written with asyncio cancellation in mind and which expected CancelledError to be caught with "except Exception" (the exception block unwound

[issue41002] HTTPResponse.read with amt is slow

2020-06-18 Thread Bruce Merry
Bruce Merry added the comment: > (perhaps 'MB/s's are wrong). Why, are you getting significantly different results? Just in case it's confusing, the results are reported as A ± B MB/s, where A is the mean and B is the standard deviation of the mean. So it's about 3GB/s when no len

[issue41002] HTTPResponse.read with amt is slow

2020-06-17 Thread Bruce Merry
Change by Bruce Merry : -- keywords: +patch pull_requests: +20124 stage: -> patch review pull_request: https://github.com/python/cpython/pull/20943 ___ Python tracker <https://bugs.python.org/issu

[issue41002] HTTPResponse.read with amt is slow

2020-06-17 Thread Bruce Merry
Change by Bruce Merry : -- type: -> performance ___ Python tracker <https://bugs.python.org/issue41002> ___ ___ Python-bugs-list mailing list Unsubscrib

[issue41002] HTTPResponse.read with amt is slow

2020-06-17 Thread Bruce Merry
New submission from Bruce Merry : I've run into this on 3.8, but the code on Git master doesn't look significantly different so I assume it still applies. I'm happy to work on a PR for this. When http.client.HTTPResponse.read is called with a specific amount to read, it goes down this code

[issue39974] A race condition with GIL releasing exists in stringlib_bytes_join

2020-03-19 Thread Bruce Merry
Bruce Merry added the comment: +tzickel I'd suggest reading the discussion in issue 36051, and maybe raising a new issue about it if you still have concerns. In short, dropping the GIL in more bytes.join cases wouldn't necessarily be wrong, but it might break code that made the assumption

[issue39974] A race condition with GIL releasing exists in stringlib_bytes_join

2020-03-16 Thread Bruce Merry
Bruce Merry added the comment: > static_buffers is not a static variable. It is auto local variable. > So I think other thread don't hijack it. Oh yes, quite right. I should have looked closer at the code first before commenting. I think this can be closed as not-a-bug, unless +tzick

[issue39974] A race condition with GIL releasing exists in stringlib_bytes_join

2020-03-16 Thread Bruce Merry
Bruce Merry added the comment: Good catch! I'll take a look this week to see what makes sense for the use case for which I originally proposed this optimisation. -- ___ Python tracker <https://bugs.python.org/issue39

[issue36051] Drop the GIL during large bytes.join operations?

2020-01-16 Thread Bruce Merry
Bruce Merry added the comment: I think I've addressed the concerns that were raised in this bug, but let me know if I've missed any. -- ___ Python tracker <https://bugs.python.org/issue36

[issue36051] Drop the GIL during large bytes.join operations?

2020-01-05 Thread Bruce Merry
Bruce Merry added the comment: I ran the test on a Xeon machine (Skylake-XP) and it also looks like performance is only improved from 1MB up (somewhat to my surprise, given how poor single-threaded memcpy performance is on that machine). So I've updated the pull request with that threshold

[issue36051] Drop the GIL during large bytes.join operations?

2020-01-05 Thread Bruce Merry
Bruce Merry added the comment: I've written a variant of the benchmark in which one thread does joins and the other does unrelated CPU-bound work that doesn't touch memory much. It also didn't show much benefit to thresholds below 512KB. I still want to test things on a server-class CPU

[issue36051] Drop the GIL during large bytes.join operations?

2020-01-02 Thread Bruce Merry
Bruce Merry added the comment: I'm realising that the benchmark makes it difficult to see what's going on because it doesn't separate overhead costs (slowdowns because releasing/acquiring the GIL is not free, particularly when contended) from cache effects (slowdowns due to parallel threads

[issue36051] Drop the GIL during large bytes.join operations?

2019-12-31 Thread Bruce Merry
Bruce Merry added the comment: > Do you think it would be sufficient to change the stress test from joining > 1000 items to joining 10 items? Actually that won't work, because the existing stress test is using a non-empty separator. I'll add another version of that stress test tha

[issue36051] Drop the GIL during large bytes.join operations?

2019-12-31 Thread Bruce Merry
Bruce Merry added the comment: > I'll take a look at extra unit tests soon. Do you know off the top of your > head where to look for existing `join` tests to add to? Never mind, I found it: https://github.com/python/cpython/blob/92709a263e9cec0bc646ccc1ea051fc528800d8d/Li

[issue36051] Drop the GIL during large bytes.join operations?

2019-12-31 Thread Bruce Merry
Bruce Merry added the comment: I've attached a benchmark script and CSV results for master (whichever version that was at the point I forked) and with unconditional dropping of the GIL. It shows up to 3x performance improvement when using 4 threads. That's on my home desktop, which is quite

[issue36051] Drop the GIL during large bytes.join operations?

2019-12-31 Thread Bruce Merry
Change by Bruce Merry : Added file: https://bugs.python.org/file48813/benchjoin.py ___ Python tracker <https://bugs.python.org/issue36051> ___ ___ Python-bugs-list mailin

[issue36051] Drop the GIL during large bytes.join operations?

2019-12-31 Thread Bruce Merry
Change by Bruce Merry : Added file: https://bugs.python.org/file48812/new.csv ___ Python tracker <https://bugs.python.org/issue36051> ___ ___ Python-bugs-list mailin

[issue36051] Drop the GIL during large bytes.join operations?

2019-12-31 Thread Bruce Merry
Change by Bruce Merry : Added file: https://bugs.python.org/file48811/old.csv ___ Python tracker <https://bugs.python.org/issue36051> ___ ___ Python-bugs-list mailin

[issue36051] Drop the GIL during large bytes.join operations?

2019-12-30 Thread Bruce Merry
Bruce Merry added the comment: If we want to be conservative, we could only drop the GIL if all the buffers pass the PyBytes_CheckExact test. Presumably that won't encounter any of these problems because bytes objects are immutable? -- ___ Python

[issue36051] Drop the GIL during large bytes.join operations?

2019-12-30 Thread Bruce Merry
Change by Bruce Merry : -- keywords: +patch pull_requests: +17193 stage: -> patch review pull_request: https://github.com/python/cpython/pull/17757 ___ Python tracker <https://bugs.python.org/issu

[issue36051] Drop the GIL during large bytes.join operations?

2019-12-22 Thread Bruce Merry
Bruce Merry added the comment: > It seems we can release GIL during iterating the buffer array. That's what I had in mind. Naturally it would require a bit of benchmarking to pick a threshold such that the small case doesn't lose performance due to locking overheads. If no one e

[issue38242] Revert the new asyncio Streams API

2019-09-30 Thread Bruce Merry
Change by Bruce Merry : -- nosy: +bmerry ___ Python tracker <https://bugs.python.org/issue38242> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue37141] Allow multiple separators in Stream.readuntil

2019-09-26 Thread Bruce Merry
Bruce Merry added the comment: I've submitted a PR: https://github.com/python/cpython/pull/16429 -- ___ Python tracker <https://bugs.python.org/issue37

[issue37141] Allow multiple separators in Stream.readuntil

2019-09-26 Thread Bruce Merry
Change by Bruce Merry : -- keywords: +patch pull_requests: +16008 stage: test needed -> patch review pull_request: https://github.com/python/cpython/pull/16429 ___ Python tracker <https://bugs.python.org/issu

[issue37141] Allow multiple separators in Stream.readuntil

2019-09-12 Thread Bruce Merry
Bruce Merry added the comment: I finally have permission from my employer to sign the contributors agreement, so I'll take a stab at this when I have some free time (unless nobody else gets to it first). -- ___ Python tracker <ht

[issue37141] Allow multiple separators in Stream.readuntil

2019-06-03 Thread Bruce Merry
Bruce Merry added the comment: Ok, I've changed the issue title to refer to Stream. Since this would be a new feature, I assume it's off the table for 3.8, but I'll see if I get time to implement a PR in time for 3.9 (and get someone at work to sign off on the contributor agreement, which

[issue37141] Allow multiple separators in StreamReader.readuntil

2019-06-03 Thread Bruce Merry
Bruce Merry added the comment: Ok. Does the new Stream still have a similar interface for readuntil i.e. is this still a relevant request against the new API? I'm happy to let deprecated APIs stay as-is. -- ___ Python tracker <ht

[issue37141] Allow multiple separators in StreamReader.readuntil

2019-06-03 Thread Bruce Merry
Bruce Merry added the comment: I wasn't aware of that deprecation - it doesn't seem to be mentioned at https://docs.python.org/3.8/library/asyncio-stream.html. What is the replacement? -- ___ Python tracker <https://bugs.python.org/issue37

[issue37141] Allow multiple separators in StreamReader.readuntil

2019-06-03 Thread Bruce Merry
New submission from Bruce Merry : Text-based protocols sometimes allow a choice of newline separator - I work with one that allows either \r or \n. Unfortunately that doesn't work with StreamReader.readuntil, which only accepts a single separator, so I've had to do some hacky things

[issue32052] Provide access to buffer of asyncio.StreamReader

2019-06-03 Thread Bruce Merry
Bruce Merry added the comment: Ok, I'll open a separate issue to allow a tuple of possible separators. -- nosy: +bmerry ___ Python tracker <https://bugs.python.org/issue32

[issue36051] (Performance) Drop the GIL during large bytes.join operations?

2019-02-20 Thread Bruce Merry
New submission from Bruce Merry : A common pattern in libraries doing I/O is to receive data in chunks, put them in a list, then join them all together using b"".join(chunks). For example, see http.client.HTTPResponse._safe_read. When the output is large, the memory copies

[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

2019-02-20 Thread Bruce Merry
New submission from Bruce Merry : While investigating poor HTTP read performance I discovered that reading all the data from a response with a content-length goes via _safe_read, which in turn reads in chunks of at most MAXAMOUNT (1MB) before stitching them together with b"".join

[issue32052] Provide access to buffer of asyncio.StreamReader

2018-10-13 Thread Bruce Merry
Bruce Merry added the comment: A sequence of possible terminators would cover my immediate use case and certainly be an improvement. To facilitate more general use cases without exposing implementation details, would it be practical and maintainable to have a "putback" method tha

[issue32395] asyncio.StreamReader.readuntil is not general enough

2017-12-20 Thread Bruce Merry
New submission from Bruce Merry <bme...@gmail.com>: I'd proposed one specific solution in Issue 32052 which asvetlov didn't like, so as requested I'm filing a bug about the problem rather than the solution. The specific case I have is reading a protocol in which either \r or \n can be

[issue32052] Provide access to buffer of asyncio.StreamReader

2017-11-16 Thread Bruce Merry
New submission from Bruce Merry <bme...@gmail.com>: While asyncio.StreamReader.readuntil is an improvement on only having readline, it is still quite limited e.g. you cannot have multiple possible terminators. The real problem is that it's not possible to roll your own without acc