[Python-Dev] O(1) deletes from the front of bytearray (was: Re: Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor)
On Tue, Oct 11, 2016 at 10:53 PM, Serhiy Storchaka wrote: > On 12.10.16 08:03, Nathaniel Smith wrote: >> >> On Tue, Oct 11, 2016 at 9:08 PM, INADA Naoki >> wrote: >>> >>> From Python 3.4, bytearray is good solution for I/O buffer, thanks to >>> #19087 [1]. >>> Actually, asyncio uses bytearray as I/O buffer often. >> >> >> Whoa what?! This is awesome, I had no idea that bytearray had O(1) >> deletes at the front. I literally reimplemented this myself on type of >> bytearray for some 3.5-only code recently because I assumed bytearray >> had the same asymptotics as list, and AFAICT this is totally >> undocumented. Shouldn't we write this down somewhere? Maybe here? -> >> https://docs.python.org/3/library/functions.html#bytearray > > > I afraid this is CPython implementation detail (like string concatenation > optimization). Other implementations can have O(N) deletes at the front of > bytearray. Well, it shouldn't be :-). The problem with the string concatenation optimization is that to work, it requires both the use of refcounting GC and that you get lucky with how the underlying malloc implementation happened to lay things out in memory. Obviously it shouldn't be part of the language spec. But amortized O(1) deletes from the front of bytearray are totally different, and more like amortized O(1) appends to list: there are important use cases[1] that simply cannot be implemented without some feature like this, and putting the implementation inside bytearray is straightforward, deterministic, and more efficiently than hacking together something on top. Python should just guarantee it, IMO. -n [1] My use case is parsing HTTP out of a receive buffer. If deleting the first k bytes of an N byte buffer is O(N), then not only does parsing becomes O(N^2) in the worst case, but it's the sort of O(N^2) that random untrusted network clients can trigger at will to DoS your server. -- Nathaniel J. Smith -- https://vorpus.org ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor
On 12.10.16 08:03, Nathaniel Smith wrote: On Tue, Oct 11, 2016 at 9:08 PM, INADA Naoki wrote: From Python 3.4, bytearray is good solution for I/O buffer, thanks to #19087 [1]. Actually, asyncio uses bytearray as I/O buffer often. Whoa what?! This is awesome, I had no idea that bytearray had O(1) deletes at the front. I literally reimplemented this myself on type of bytearray for some 3.5-only code recently because I assumed bytearray had the same asymptotics as list, and AFAICT this is totally undocumented. Shouldn't we write this down somewhere? Maybe here? -> https://docs.python.org/3/library/functions.html#bytearray I afraid this is CPython implementation detail (like string concatenation optimization). Other implementations can have O(N) deletes at the front of bytearray. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor
On 12.10.16 07:08, INADA Naoki wrote: Sample code: def read_line(buf: bytearray) -> bytes: try: n = buf.index(b'\r\n') except ValueError: return b'' line = bytes(buf)[:n] # bytearray -> bytes -> bytes Wouldn't be more correct to write this as bytes(buf[:n])? Adding one more constructor to bytes: # when length=-1 (default), use until end of *byteslike*. bytes.frombuffer(byteslike, length=-1, offset=0) This interface looks unusual. Would not be better to support the interface of buffer in Python 2: buffer(object [, offset[, size]])? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor
I don't think it makes sense to add any more ideas to PEP 467. That needed to be a PEP because it proposed breaking backwards compatibility in a couple of areas, and because of the complex history of Python 3's "bytes-as-tuple-of-ints" and Python 2's "bytes-as-str" semantics. Other enhancements to the binary data handling APIs in Python 3 can be considered on their own merits. On 12 October 2016 at 14:08, INADA Naoki wrote: > Memoryview problem > = > > To avoid redundant copy of `line = bytes(buf)[:n]`, current solution > is using memoryview. > > First code I wrote is: `line = bytes(memoryview(buf)[:n])`. > > On CPython, it works fine. But `del buff[:n+2]` in next line may fail > on other Python > implementations. Changing bytearray size is inhibited while > memoryview is alive. > > So right code is: > > with memoryview(buf) as m: > line = bytes(m[:n]) > > The problem of memoryview approach is: > > * Overhead: creating temporary memoryview, __enter__, and __exit__. (see > below) > > * It isn't "one obvious way": Developers including me may forget to > use context manager. > And since it works on CPython, it's hard to point it out. To add to the confusion, there's also https://docs.python.org/3/library/stdtypes.html#memoryview.tobytes giving: line = memoryview(buf)[:n].tobytes() However, folks *do* need to learn that many mutable data types will lock themselves against modification while you have a live memory view on them, so it's important to release views promptly and reliably when we don't need them any more. > Quick benchmark: > > (temporary bytes) > $ python3 -m perf timeit -s 'buf = > bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- 'bytes(buf)[:3]' > > Median +- std dev: 652 ns +- 19 ns > > (temporary memoryview without "with" > $ python3 -m perf timeit -s 'buf = > bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- 'bytes(memoryview(buf)[:3])' > > Median +- std dev: 886 ns +- 26 ns > > (temporary memoryview with "with") > $ python3 -m perf timeit -s 'buf = bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- ' > with memoryview(buf) as m: > bytes(m[:3]) > ' > > Median +- std dev: 1.11 us +- 0.03 us This is normal though, as memory views trade lower O(N) costs (reduced data copying) for higher O(1) setup costs (creating and managing the view, indirection for data access). > Proposed solution > === > > Adding one more constructor to bytes: > > # when length=-1 (default), use until end of *byteslike*. > bytes.frombuffer(byteslike, length=-1, offset=0) > > With ths API > > with memoryview(buf) as m: > line = bytes(m[:n]) > > becomes > > line = bytes.frombuffer(buf, n) Does that need to be a method on the builtin rather than a separate helper function, though? Once you define: def snapshot(buf, length=None, offset=0): with memoryview(buf) as m: return m[offset:length].tobytes() then that can be replaced by a more optimised C implementation without users needing to care about the internal details. That is, getting back to a variant on one of Serhiy's suggestions in the last PEP 467 discussion, it may make sense for us to offer a "buffertools" library that's specifically aimed at supporting efficient buffer manipulation operations that minimise data copying. The pure Python implementations would work entirely through memoryview, but we could also have selected C accelerated operations if that showed a noticeable improvement on asyncio's benchmarks. Regards, Nick. P.S. The length/offset API design is also problematic due to the way it differs from range() & slice(), but I don't think it makes sense to get into that kind of detail before discussing the larger question of adding a new helper module for working efficiently with memory buffers vs further widening the method API for the builtin bytes type -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor
On Tue, Oct 11, 2016 at 9:08 PM, INADA Naoki wrote: > From Python 3.4, bytearray is good solution for I/O buffer, thanks to > #19087 [1]. > Actually, asyncio uses bytearray as I/O buffer often. Whoa what?! This is awesome, I had no idea that bytearray had O(1) deletes at the front. I literally reimplemented this myself on type of bytearray for some 3.5-only code recently because I assumed bytearray had the same asymptotics as list, and AFAICT this is totally undocumented. Shouldn't we write this down somewhere? Maybe here? -> https://docs.python.org/3/library/functions.html#bytearray > # when length=-1 (default), use until end of *byteslike*. > bytes.frombuffer(byteslike, length=-1, offset=0) This seems reasonable to me. Mostly I've dealt with the problem by writing functions like your read_line so that they return bytearray objects, since that's the only thing you can get out of a bytearray without double-copying. But returning bytes would feel a bit cleaner, and this would allow that. -n -- Nathaniel J. Smith -- https://vorpus.org ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor
Hi. While there were no reply to my previous post on Python-ideas ML, Now I'm sure about bytes.frombuffer() is worth enough. Let's describe why I think it's important. Background = >From Python 3.4, bytearray is good solution for I/O buffer, thanks to #19087 [1]. Actually, asyncio uses bytearray as I/O buffer often. When bytearray is used for read buffer, we can parse received data on bytearray directly, and consume it. For example, read until '\r\n' is easier than io.BytesIO(). Sample code: def read_line(buf: bytearray) -> bytes: try: n = buf.index(b'\r\n') except ValueError: return b'' line = bytes(buf)[:n] # bytearray -> bytes -> bytes del buf[:n+2] return line buf = bytearray(b'foo\r\nbar\r\nbaz\r\n') while True: line = read_line(buf) if not line: break print(line) As you can see, redundant temporary bytes is used. This is not ideal for performance and memory efficiency. Since code like this is typically in lower level code (e.g. asyncio), performance and efficiency is important. [1] https://bugs.python.org/issue19087 (Off topic: bytearray is nice for write buffer too. written = s.send(buf); del buf[:written];) Memoryview problem = To avoid redundant copy of `line = bytes(buf)[:n]`, current solution is using memoryview. First code I wrote is: `line = bytes(memoryview(buf)[:n])`. On CPython, it works fine. But `del buff[:n+2]` in next line may fail on other Python implementations. Changing bytearray size is inhibited while memoryview is alive. So right code is: with memoryview(buf) as m: line = bytes(m[:n]) The problem of memoryview approach is: * Overhead: creating temporary memoryview, __enter__, and __exit__. (see below) * It isn't "one obvious way": Developers including me may forget to use context manager. And since it works on CPython, it's hard to point it out. Quick benchmark: (temporary bytes) $ python3 -m perf timeit -s 'buf = bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- 'bytes(buf)[:3]' Median +- std dev: 652 ns +- 19 ns (temporary memoryview without "with" $ python3 -m perf timeit -s 'buf = bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- 'bytes(memoryview(buf)[:3])' Median +- std dev: 886 ns +- 26 ns (temporary memoryview with "with") $ python3 -m perf timeit -s 'buf = bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- ' with memoryview(buf) as m: bytes(m[:3]) ' Median +- std dev: 1.11 us +- 0.03 us Proposed solution === Adding one more constructor to bytes: # when length=-1 (default), use until end of *byteslike*. bytes.frombuffer(byteslike, length=-1, offset=0) With ths API with memoryview(buf) as m: line = bytes(m[:n]) becomes line = bytes.frombuffer(buf, n) Thanks, -- INADA Naoki ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Optimizing list.sort() by checking type in advance
[Terry Reedy] > > This seems like a generic issue with timing mutation methods > ... > But I am sure Tim worked this out in his test code, which should be > reused, perhaps updated with Viktor's perf module to get the most > stable timings possible. sortperf.py is older than me ;-) It's not at all intended to give fine-grained timings: it's a sledgehammer than runs each case _only once_, to confirm/refute _big_ changes. So it's useful qualitatively when a "big claimed change" is at issue, but useless for quantifying beyond "wow!" or "ouch!" or "meh". Since this is a "big claimed change", a "wow!" outcome is good-enough confirmation. And a "meh" outcome would be almost good enough for cases where the new specializations don't apply - although I haven't seen any numbers from sortperf.py for those cases yet. I expect a "meh" outcome for those, based on reasoning - but may be surprised by reality. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Optimizing list.sort() by checking type in advance
On 10/11/2016 10:26 AM, Chris Angelico wrote: After the first call, the list will be sorted. Any subsequent attempts will use the sorted list. This seems like a generic issue with timing mutation methods. Is the mutated output suitable input for another mutation. With list.reverse, the output is suitable. Ditto for other permutations that are independent of the data, including random.shuffle. With list.pop, the number of mutations has to be limited to the length of the list. With list.sort, the list needs to be 'restored' -- either copied or shuffled each time. So it seems that one is stuck with timing 'restore', restore + sort_a, restore + sort_b, subtracting the timing for restore, and comparing. But I am sure Tim worked this out in his test code, which should be reused, perhaps updated with Viktor's perf module to get the most stable timings possible. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Optimizing list.sort() by checking type in advance
On 11 October 2016 at 15:32, Elliot Gorokhovsky wrote: > But the sort mutates F...does the setup get executed each time? I thought > it's just at the beginning. So then F gets mutated (sorted) and subsequent > sorts time wrong. Did I not say earlier - sorry. I'm suggesting that you put each timing run into a separate Python process. # Optimised code python -m perf timeit -s "from mymod import FastList; L=;F=FastList(L)" "F.fastsort()" # Core sort routine python -m perf timeit -s "L=" "L.sort()" # Or maybe, if FastList exposes the existing sort function as well # Non-optimised code python -m perf timeit -s "from mymod import FastList; L=;F=FastList(L)" "F.sort()" Paul. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Optimizing list.sort() by checking type in advance
But the sort mutates F...does the setup get executed each time? I thought it's just at the beginning. So then F gets mutated (sorted) and subsequent sorts time wrong. On Tue, Oct 11, 2016, 7:51 AM Paul Moore wrote: > On 11 October 2016 at 14:04, Elliot Gorokhovsky > wrote: > > Right, that sounds good, but there's just one thing I don't understand > > that's keeping me from using it. Namely, I would define a benchmark list > L > > in my setup, and then I would have code="F=FastList(L);F.fastsort()". The > > problem here is I'm measuring the constructor time along with the sort > time, > > right, so wouldn't that mess up the benchmark? Or does timeit separate > the > > times? > > That would mess up your times. Put F=FastList(L) in your setup. > > Paul > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Optimizing list.sort() by checking type in advance
Right, that sounds good, but there's just one thing I don't understand that's keeping me from using it. Namely, I would define a benchmark list L in my setup, and then I would have code="F=FastList(L);F.fastsort()". The problem here is I'm measuring the constructor time along with the sort time, right, so wouldn't that mess up the benchmark? Or does timeit separate the times? On Tue, Oct 11, 2016, 2:22 AM Paul Moore wrote: > On 11 October 2016 at 03:15, Elliot Gorokhovsky > wrote: > > There's an option to provide setup code, of course, but I need to set up > > before each trial, not just before the loop. > > Typically, I would just run the benchmark separately for each case, > and then you'd do > > # Case 1 > python -m perf timeit -s 'setup; code; here' 'code; to; be; timed; here' > [Results 1] > # Case 2 > python -m perf timeit -s 'setup; code; here' 'code; to; be; timed; here' > [Results 2] > > The other advantage of doing it this way is that you can post your > benchmark command lines, which will allow people to see what you're > timing, and if there *are* any problems (such as a method lookup that > skews the results) people can point them out. > > Paul > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Optimizing list.sort() by checking type in advance
On Wed, Oct 12, 2016 at 1:24 AM, Paul Moore wrote: > On 11 October 2016 at 15:00, Chris Angelico wrote: >> On Wed, Oct 12, 2016 at 12:51 AM, Paul Moore wrote: >>> On 11 October 2016 at 14:04, Elliot Gorokhovsky >>> wrote: Right, that sounds good, but there's just one thing I don't understand that's keeping me from using it. Namely, I would define a benchmark list L in my setup, and then I would have code="F=FastList(L);F.fastsort()". The problem here is I'm measuring the constructor time along with the sort time, right, so wouldn't that mess up the benchmark? Or does timeit separate the times? >>> >>> That would mess up your times. Put F=FastList(L) in your setup. >> >> But then you're resorting an already-sorted list, which may well have >> different timings (it certainly does in timsort). > > Why would it be already sorted? I assume FastList(L) is simply a > wrapper round a normal list that has a modified sort method with the > optimisation included. > > Of course, that's essentially the point here - without seeing the > code, we're (to an extent) guessing. After the first call, the list will be sorted. Any subsequent attempts will use the sorted list. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Optimizing list.sort() by checking type in advance
On 11 October 2016 at 15:00, Chris Angelico wrote: > On Wed, Oct 12, 2016 at 12:51 AM, Paul Moore wrote: >> On 11 October 2016 at 14:04, Elliot Gorokhovsky >> wrote: >>> Right, that sounds good, but there's just one thing I don't understand >>> that's keeping me from using it. Namely, I would define a benchmark list L >>> in my setup, and then I would have code="F=FastList(L);F.fastsort()". The >>> problem here is I'm measuring the constructor time along with the sort time, >>> right, so wouldn't that mess up the benchmark? Or does timeit separate the >>> times? >> >> That would mess up your times. Put F=FastList(L) in your setup. > > But then you're resorting an already-sorted list, which may well have > different timings (it certainly does in timsort). Why would it be already sorted? I assume FastList(L) is simply a wrapper round a normal list that has a modified sort method with the optimisation included. Of course, that's essentially the point here - without seeing the code, we're (to an extent) guessing. Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Optimizing list.sort() by checking type in advance
On Wed, Oct 12, 2016 at 12:51 AM, Paul Moore wrote: > On 11 October 2016 at 14:04, Elliot Gorokhovsky > wrote: >> Right, that sounds good, but there's just one thing I don't understand >> that's keeping me from using it. Namely, I would define a benchmark list L >> in my setup, and then I would have code="F=FastList(L);F.fastsort()". The >> problem here is I'm measuring the constructor time along with the sort time, >> right, so wouldn't that mess up the benchmark? Or does timeit separate the >> times? > > That would mess up your times. Put F=FastList(L) in your setup. But then you're resorting an already-sorted list, which may well have different timings (it certainly does in timsort). ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Optimizing list.sort() by checking type in advance
On 11 October 2016 at 14:04, Elliot Gorokhovsky wrote: > Right, that sounds good, but there's just one thing I don't understand > that's keeping me from using it. Namely, I would define a benchmark list L > in my setup, and then I would have code="F=FastList(L);F.fastsort()". The > problem here is I'm measuring the constructor time along with the sort time, > right, so wouldn't that mess up the benchmark? Or does timeit separate the > times? That would mess up your times. Put F=FastList(L) in your setup. Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyWeakref_GetObject() borrows its reference from... whom?
By the way, just to finish playing this new fun game "Who'd I Borrow This Reference From?": On 10/10/2016 11:45 AM, Chris Angelico wrote: On Mon, Oct 10, 2016 at 8:35 PM, Larry Hastings wrote: I bet for every other spot in the API I can tell you from whom you're borrowing the reference. Okay. Here's a test: PyObject* PyList_GetItem(PyObject *list, Py_ssize_t index) Return value: Borrowed reference. Presumably you own a reference to the list itself before you call this, and the list has a reference to its items. But suppose another thread clear()s the list immediately after you call this; whose reference are you borrowing now? The list's is gone. As you say: you've borrowed the reference from the list. With the GIL this is safe. After the Gilectomy it's not always safe. It *can* be: for example, if you allocated the list locally in the current thread, and it hasn't escaped the thread yet, you can assert that in this case reference will never go away on you. But in general this is an API that has to change for the Gilectomy. //arry/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyWeakref_GetObject() borrows its reference from... whom?
> Huh? In all other circumstances, a "borrowed" reference is exactly that: X has a reference, and you are relying on X's reference to keep the object alive. Borrowing from a borrowed reference is simply a chain of these; Z borrows from Y, Y borrows from X, and X is the original person who did the incref. But you're borrowing from something specific, somebody who the API guarantees has a legitimate reference on the object and won't drop it while you're using it. I bet for every other spot in the API I can tell you from whom you're borrowing the reference. > > In contrast, the "borrowed" reference returned by PyWeakRef_GetObject() seems to be "borrowed" from some unspecified entity. The fact that the object is live in Python directly implies that, yes, *somebody* must have a reference, somewhere. But ISTM (and apparently you) that this is relying on the GIL preventing that unknown other actor from dropping their reference while you've borrow it. A guarantee that the post-Gilectomy Python interpreter can no longer make! Let me try again. The only rule for borrowing: x (borrowed reference) is only guaranteed to be alive for as long as y (source) is guaranteed to be alive. At least if you phrase it carefully enough, weakrefs still fit the bill -- your borrowed reference is alive for as long as the weakref is alive. Of course, the lifetime for a weakref is completely undefined in GILectomized Python, and barely controllable even in regular CPython. So this is not a great API. At the very least we can contort definitions into making it plausible that we borrowed from the weakref. If we can only analyze code through the lens of borrowing, we have no choice except to look at it this way. > In any case, I see nothing in the documentation that suggests "borrowed only means unowned" as you suggest. In contrast, the documentation seems to suggest that the metaphor is how I understood it; that when you "borrow" a reference, there is another object who has a reference and you're relying on their reference to keep the object alive. This is its own big tangent. Sorry, Your link is all I have too. It doesn't spell it out. AFAIK there are exactly two kinds of references discussed anywhere in the docs: owned references, where you are obligated to call Py_DECREF, and everything else. Python exclusively uses the term "borrowed references" for that "everything else". I don't know why. It's a good way to encourage reasonable practices, I guess. As the docs themselves note, this metaphor is useful but flawed: e.g. you can "borrow" the same thing and the original is still usable. But I'd go in another direction -- the metaphor is sufficient to show safety of some code, but some safe code exists where we can't as easily use "borrowing" to talk about why it's safe, and might need to introduce new concepts or switch tactics. PyObject* x = PyList_New(0); // x is a new owned reference. PyObject* y = x; // y is a borrowed reference. PyObject* z = x; // z is also a borrowed reference. Py_INCREF(y); // y has been upgraded to an owned reference. Py_CLEAR(x); // the original reference z borrowed from disappeared. // This is provably safe, but you can't use the "borrowing" metaphor for that proof without introducing new concepts. do_stuff(z); // You might wonder, "why would anyone ever do that?!?" // One reason is because some functions "steal" references, so you need to borrow before handing off ownership: // y is an owned reference. my_struct->foo = y // borrowed a reference to y. PyTuple_SetItem(my_strict->some_tuple, 0, y); // stole y. // Now, in a super-strict interpretation, y is no longer "valid", and the original borrowing relationship has been broken. // We should ideally reason "as if" it borrowed from my_struct->some_tuple, even though it didn't. // (Obviously, this example is still a little overcomplicated, but you get how this might happen IRL, yeah? // e.g. maybe z already existed and the API stealing a reference doesn't have a GET_ITEM macro.) // [Note that this has a similar problem to weakref: another thread could mutate the tuple and delete the object. Yikes!] I hope those examples made sense. weakref is playing with fire in a similar way: PyWeakref_GetObject is safe because someone still exists, and there is a lock that guarantees they exist as long as you don't release that lock and don't run any code that might delete a reference. (In particular, it's guaranteed safe to immediately upgrade the borrowed reference -- or is without gilectomy.) Unlike the stealing tuple example, it isn't clear where the owned reference is. So what I mean by saying it's a red herring, is that the correctness of code doesn't hinge on how easy it is to apply the concept of borrowing, but exclusively on lifetimes and whether your borrowed reference can be proven to lie within the object lifetime. If it could not at all be explained sensibly as a "borrow" from anyone, it can still be right -- that would only make it
Re: [Python-Dev] Optimizing list.sort() by checking type in advance
On 11 October 2016 at 03:15, Elliot Gorokhovsky wrote: > There's an option to provide setup code, of course, but I need to set up > before each trial, not just before the loop. Typically, I would just run the benchmark separately for each case, and then you'd do # Case 1 python -m perf timeit -s 'setup; code; here' 'code; to; be; timed; here' [Results 1] # Case 2 python -m perf timeit -s 'setup; code; here' 'code; to; be; timed; here' [Results 2] The other advantage of doing it this way is that you can post your benchmark command lines, which will allow people to see what you're timing, and if there *are* any problems (such as a method lookup that skews the results) people can point them out. Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com