[Python-Dev] O(1) deletes from the front of bytearray (was: Re: Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor)

2016-10-11 Thread Nathaniel Smith
On Tue, Oct 11, 2016 at 10:53 PM, Serhiy Storchaka  wrote:
> On 12.10.16 08:03, Nathaniel Smith wrote:
>>
>> On Tue, Oct 11, 2016 at 9:08 PM, INADA Naoki 
>> wrote:
>>>
>>> From Python 3.4, bytearray is good solution for I/O buffer, thanks to
>>> #19087 [1].
>>> Actually, asyncio uses bytearray as I/O buffer often.
>>
>>
>> Whoa what?! This is awesome, I had no idea that bytearray had O(1)
>> deletes at the front. I literally reimplemented this myself on type of
>> bytearray for some 3.5-only code recently because I assumed bytearray
>> had the same asymptotics as list, and AFAICT this is totally
>> undocumented. Shouldn't we write this down somewhere? Maybe here? ->
>> https://docs.python.org/3/library/functions.html#bytearray
>
>
> I afraid this is CPython implementation detail (like string concatenation
> optimization). Other implementations can have O(N) deletes at the front of
> bytearray.

Well, it shouldn't be :-).

The problem with the string concatenation optimization is that to
work, it requires both the use of refcounting GC and that you get
lucky with how the underlying malloc implementation happened to lay
things out in memory. Obviously it shouldn't be part of the language
spec.

But amortized O(1) deletes from the front of bytearray are totally
different, and more like amortized O(1) appends to list: there are
important use cases[1] that simply cannot be implemented without some
feature like this, and putting the implementation inside bytearray is
straightforward, deterministic, and more efficiently than hacking
together something on top. Python should just guarantee it, IMO.

-n

[1] My use case is parsing HTTP out of a receive buffer. If deleting
the first k bytes of an N byte buffer is O(N), then not only does
parsing becomes O(N^2) in the worst case, but it's the sort of O(N^2)
that random untrusted network clients can trigger at will to DoS your
server.

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor

2016-10-11 Thread Serhiy Storchaka

On 12.10.16 08:03, Nathaniel Smith wrote:

On Tue, Oct 11, 2016 at 9:08 PM, INADA Naoki  wrote:

From Python 3.4, bytearray is good solution for I/O buffer, thanks to
#19087 [1].
Actually, asyncio uses bytearray as I/O buffer often.


Whoa what?! This is awesome, I had no idea that bytearray had O(1)
deletes at the front. I literally reimplemented this myself on type of
bytearray for some 3.5-only code recently because I assumed bytearray
had the same asymptotics as list, and AFAICT this is totally
undocumented. Shouldn't we write this down somewhere? Maybe here? ->
https://docs.python.org/3/library/functions.html#bytearray


I afraid this is CPython implementation detail (like string 
concatenation optimization). Other implementations can have O(N) deletes 
at the front of bytearray.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor

2016-10-11 Thread Serhiy Storchaka

On 12.10.16 07:08, INADA Naoki wrote:

Sample code:

def read_line(buf: bytearray) -> bytes:
try:
n = buf.index(b'\r\n')
except ValueError:
return b''

line = bytes(buf)[:n]  # bytearray -> bytes -> bytes


Wouldn't be more correct to write this as bytes(buf[:n])?


Adding one more constructor to bytes:

# when length=-1 (default), use until end of *byteslike*.
bytes.frombuffer(byteslike, length=-1, offset=0)


This interface looks unusual. Would not be better to support the 
interface of buffer in Python 2: buffer(object [, offset[, size]])?



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor

2016-10-11 Thread Nick Coghlan
I don't think it makes sense to add any more ideas to PEP 467. That
needed to be a PEP because it proposed breaking backwards
compatibility in a couple of areas, and because of the complex history
of Python 3's "bytes-as-tuple-of-ints" and Python 2's "bytes-as-str"
semantics.

Other enhancements to the binary data handling APIs in Python 3 can be
considered on their own merits.

On 12 October 2016 at 14:08, INADA Naoki  wrote:
> Memoryview problem
> =
>
> To avoid redundant copy of `line = bytes(buf)[:n]`, current solution
> is using memoryview.
>
> First code I wrote is: `line = bytes(memoryview(buf)[:n])`.
>
> On CPython, it works fine.  But `del buff[:n+2]` in next line may fail
> on other Python
> implementations.  Changing bytearray size is inhibited while
> memoryview is alive.
>
> So right code is:
>
> with memoryview(buf) as m:
> line = bytes(m[:n])
>
> The problem of memoryview approach is:
>
> * Overhead: creating temporary memoryview, __enter__, and __exit__. (see 
> below)
>
> * It isn't "one obvious way": Developers including me may forget to
> use context manager.
>   And since it works on CPython, it's hard to point it out.

To add to the confusion, there's also
https://docs.python.org/3/library/stdtypes.html#memoryview.tobytes
giving:

line = memoryview(buf)[:n].tobytes()

However, folks *do* need to learn that many mutable data types will
lock themselves against modification while you have a live memory view
on them, so it's important to release views promptly and reliably when
we don't need them any more.

> Quick benchmark:
>
> (temporary bytes)
> $ python3 -m perf timeit -s 'buf =
> bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- 'bytes(buf)[:3]'
> 
> Median +- std dev: 652 ns +- 19 ns
>
> (temporary memoryview without "with"
> $ python3 -m perf timeit -s 'buf =
> bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- 'bytes(memoryview(buf)[:3])'
> 
> Median +- std dev: 886 ns +- 26 ns
>
> (temporary memoryview with "with")
> $ python3 -m perf timeit -s 'buf = bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- '
> with memoryview(buf) as m:
> bytes(m[:3])
> '
> 
> Median +- std dev: 1.11 us +- 0.03 us

This is normal though, as memory views trade lower O(N) costs (reduced
data copying) for higher O(1) setup costs (creating and managing the
view, indirection for data access).

> Proposed solution
> ===
>
> Adding one more constructor to bytes:
>
> # when length=-1 (default), use until end of *byteslike*.
> bytes.frombuffer(byteslike, length=-1, offset=0)
>
> With ths API
>
> with memoryview(buf) as m:
> line = bytes(m[:n])
>
> becomes
>
> line = bytes.frombuffer(buf, n)

Does that need to be a method on the builtin rather than a separate
helper function, though? Once you define:

def snapshot(buf, length=None, offset=0):
with memoryview(buf) as m:
return m[offset:length].tobytes()

then that can be replaced by a more optimised C implementation without
users needing to care about the internal details.

That is, getting back to a variant on one of Serhiy's suggestions in
the last PEP 467 discussion, it may make sense for us to offer a
"buffertools" library that's specifically aimed at supporting
efficient buffer manipulation operations that minimise data copying.
The pure Python implementations would work entirely through
memoryview, but we could also have selected C accelerated operations
if that showed a noticeable improvement on asyncio's benchmarks.

Regards,
Nick.

P.S. The length/offset API design is also problematic due to the way
it differs from range() & slice(), but I don't think it makes sense to
get into that kind of detail before discussing the larger question of
adding a new helper module for working efficiently with memory buffers
vs further widening the method API for the builtin bytes type

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor

2016-10-11 Thread Nathaniel Smith
On Tue, Oct 11, 2016 at 9:08 PM, INADA Naoki  wrote:
> From Python 3.4, bytearray is good solution for I/O buffer, thanks to
> #19087 [1].
> Actually, asyncio uses bytearray as I/O buffer often.

Whoa what?! This is awesome, I had no idea that bytearray had O(1)
deletes at the front. I literally reimplemented this myself on type of
bytearray for some 3.5-only code recently because I assumed bytearray
had the same asymptotics as list, and AFAICT this is totally
undocumented. Shouldn't we write this down somewhere? Maybe here? ->
https://docs.python.org/3/library/functions.html#bytearray

> # when length=-1 (default), use until end of *byteslike*.
> bytes.frombuffer(byteslike, length=-1, offset=0)

This seems reasonable to me. Mostly I've dealt with the problem by
writing functions like your read_line so that they return bytearray
objects, since that's the only thing you can get out of a bytearray
without double-copying. But returning bytes would feel a bit cleaner,
and this would allow that.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor

2016-10-11 Thread INADA Naoki
Hi.

While there were no reply to my previous post on Python-ideas ML,
Now I'm sure about bytes.frombuffer() is worth enough.

Let's describe why I think it's important.


Background
=

>From Python 3.4, bytearray is good solution for I/O buffer, thanks to
#19087 [1].
Actually, asyncio uses bytearray as I/O buffer often.

When bytearray is used for read buffer, we can parse received data on bytearray
directly, and consume it.  For example, read until '\r\n' is easier
than io.BytesIO().

Sample code:

def read_line(buf: bytearray) -> bytes:
try:
n = buf.index(b'\r\n')
except ValueError:
return b''

line = bytes(buf)[:n]  # bytearray -> bytes -> bytes
del buf[:n+2]
return line


buf = bytearray(b'foo\r\nbar\r\nbaz\r\n')

while True:
line = read_line(buf)
if not line:
break
print(line)

As you can see, redundant temporary bytes is used.
This is not ideal for performance and memory efficiency.

Since code like this is typically in lower level code (e.g. asyncio),
performance and
efficiency is important.

[1] https://bugs.python.org/issue19087

(Off topic: bytearray is nice for write buffer too. written =
s.send(buf); del buf[:written];)


Memoryview problem
=

To avoid redundant copy of `line = bytes(buf)[:n]`, current solution
is using memoryview.

First code I wrote is: `line = bytes(memoryview(buf)[:n])`.

On CPython, it works fine.  But `del buff[:n+2]` in next line may fail
on other Python
implementations.  Changing bytearray size is inhibited while
memoryview is alive.

So right code is:

with memoryview(buf) as m:
line = bytes(m[:n])

The problem of memoryview approach is:

* Overhead: creating temporary memoryview, __enter__, and __exit__. (see below)

* It isn't "one obvious way": Developers including me may forget to
use context manager.
  And since it works on CPython, it's hard to point it out.


Quick benchmark:

(temporary bytes)
$ python3 -m perf timeit -s 'buf =
bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- 'bytes(buf)[:3]'

Median +- std dev: 652 ns +- 19 ns

(temporary memoryview without "with"
$ python3 -m perf timeit -s 'buf =
bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- 'bytes(memoryview(buf)[:3])'

Median +- std dev: 886 ns +- 26 ns

(temporary memoryview with "with")
$ python3 -m perf timeit -s 'buf = bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- '
with memoryview(buf) as m:
bytes(m[:3])
'

Median +- std dev: 1.11 us +- 0.03 us


Proposed solution
===

Adding one more constructor to bytes:

# when length=-1 (default), use until end of *byteslike*.
bytes.frombuffer(byteslike, length=-1, offset=0)

With ths API

with memoryview(buf) as m:
line = bytes(m[:n])

becomes

line = bytes.frombuffer(buf, n)


Thanks,

-- 
INADA Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Optimizing list.sort() by checking type in advance

2016-10-11 Thread Tim Peters
[Terry Reedy]
>
> This seems like a generic issue with timing mutation methods
> ...
> But I am sure Tim worked this out in his test code, which should be
> reused, perhaps updated with Viktor's perf module to get the most
> stable timings possible.

sortperf.py is older than me ;-)  It's not at all intended to give
fine-grained timings:  it's a sledgehammer than runs each case _only
once_, to confirm/refute _big_ changes.

So it's useful qualitatively when a "big claimed change" is at issue,
but useless for quantifying beyond "wow!" or "ouch!" or "meh".

Since this is a "big claimed change", a "wow!" outcome is good-enough
confirmation.  And a "meh" outcome would be almost good enough for
cases where the new specializations don't apply - although I haven't
seen any numbers from sortperf.py for those cases yet.  I expect a
"meh" outcome for those, based on reasoning - but may be surprised by
reality.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Optimizing list.sort() by checking type in advance

2016-10-11 Thread Terry Reedy

On 10/11/2016 10:26 AM, Chris Angelico wrote:


After the first call, the list will be sorted. Any subsequent attempts
will use the sorted list.


This seems like a generic issue with timing mutation methods.  Is the 
mutated output suitable input for another mutation.  With list.reverse, 
the output is suitable.  Ditto for other permutations that are 
independent of the data, including random.shuffle.  With list.pop, the 
number of mutations has to be limited to the length of the list.  With 
list.sort, the list needs to be 'restored' -- either copied or shuffled 
each time.  So it seems that one is stuck with timing 'restore', restore 
+ sort_a, restore + sort_b, subtracting the timing for restore, and 
comparing.  But I am sure Tim worked this out in his test code, which 
should be reused, perhaps updated with Viktor's perf module to get the 
most stable timings possible.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Optimizing list.sort() by checking type in advance

2016-10-11 Thread Paul Moore
On 11 October 2016 at 15:32, Elliot Gorokhovsky
 wrote:
> But the sort mutates F...does the setup get executed each time? I thought
> it's just at the beginning. So then F gets mutated (sorted) and subsequent
> sorts time wrong.

Did I not say earlier - sorry. I'm suggesting that you put each timing
run into a separate Python process.

# Optimised code
python -m perf timeit -s "from mymod import FastList; L=;F=FastList(L)" "F.fastsort()"
# Core sort routine
python -m perf timeit -s "L=" "L.sort()"
# Or maybe, if FastList exposes the existing sort function as well
# Non-optimised code
python -m perf timeit -s "from mymod import FastList; L=;F=FastList(L)" "F.sort()"

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Optimizing list.sort() by checking type in advance

2016-10-11 Thread Elliot Gorokhovsky
But the sort mutates F...does the setup get executed each time? I thought
it's just at the beginning. So then F gets mutated (sorted) and subsequent
sorts time wrong.

On Tue, Oct 11, 2016, 7:51 AM Paul Moore  wrote:

> On 11 October 2016 at 14:04, Elliot Gorokhovsky
>  wrote:
> > Right, that sounds good, but there's just one thing I don't understand
> > that's keeping me from using it. Namely, I would define a benchmark list
> L
> > in my setup, and then I would have code="F=FastList(L);F.fastsort()". The
> > problem here is I'm measuring the constructor time along with the sort
> time,
> > right, so wouldn't that mess up the benchmark? Or does timeit separate
> the
> > times?
>
> That would mess up your times. Put F=FastList(L) in your setup.
>
> Paul
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Optimizing list.sort() by checking type in advance

2016-10-11 Thread Elliot Gorokhovsky
Right, that sounds good, but there's just one thing I don't understand
that's keeping me from using it. Namely, I would define a benchmark list L
in my setup, and then I would have code="F=FastList(L);F.fastsort()". The
problem here is I'm measuring the constructor time along with the sort
time, right, so wouldn't that mess up the benchmark? Or does timeit
separate the times?

On Tue, Oct 11, 2016, 2:22 AM Paul Moore  wrote:

> On 11 October 2016 at 03:15, Elliot Gorokhovsky
>  wrote:
> > There's an option to provide setup code, of course, but I need to set up
> > before each trial, not just before the loop.
>
> Typically, I would just run the benchmark separately for each case,
> and then you'd do
>
> # Case 1
> python -m perf timeit -s 'setup; code; here' 'code; to; be; timed; here'
> [Results 1]
> # Case 2
> python -m perf timeit -s 'setup; code; here' 'code; to; be; timed; here'
> [Results 2]
>
> The other advantage of doing it this way is that you can post your
> benchmark command lines, which will allow people to see what you're
> timing, and if there *are* any problems (such as a method lookup that
> skews the results) people can point them out.
>
> Paul
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Optimizing list.sort() by checking type in advance

2016-10-11 Thread Chris Angelico
On Wed, Oct 12, 2016 at 1:24 AM, Paul Moore  wrote:
> On 11 October 2016 at 15:00, Chris Angelico  wrote:
>> On Wed, Oct 12, 2016 at 12:51 AM, Paul Moore  wrote:
>>> On 11 October 2016 at 14:04, Elliot Gorokhovsky
>>>  wrote:
 Right, that sounds good, but there's just one thing I don't understand
 that's keeping me from using it. Namely, I would define a benchmark list L
 in my setup, and then I would have code="F=FastList(L);F.fastsort()". The
 problem here is I'm measuring the constructor time along with the sort 
 time,
 right, so wouldn't that mess up the benchmark? Or does timeit separate the
 times?
>>>
>>> That would mess up your times. Put F=FastList(L) in your setup.
>>
>> But then you're resorting an already-sorted list, which may well have
>> different timings (it certainly does in timsort).
>
> Why would it be already sorted? I assume FastList(L) is simply a
> wrapper round a normal list that has a modified sort method with the
> optimisation included.
>
> Of course, that's essentially the point here - without seeing the
> code, we're (to an extent) guessing.

After the first call, the list will be sorted. Any subsequent attempts
will use the sorted list.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Optimizing list.sort() by checking type in advance

2016-10-11 Thread Paul Moore
On 11 October 2016 at 15:00, Chris Angelico  wrote:
> On Wed, Oct 12, 2016 at 12:51 AM, Paul Moore  wrote:
>> On 11 October 2016 at 14:04, Elliot Gorokhovsky
>>  wrote:
>>> Right, that sounds good, but there's just one thing I don't understand
>>> that's keeping me from using it. Namely, I would define a benchmark list L
>>> in my setup, and then I would have code="F=FastList(L);F.fastsort()". The
>>> problem here is I'm measuring the constructor time along with the sort time,
>>> right, so wouldn't that mess up the benchmark? Or does timeit separate the
>>> times?
>>
>> That would mess up your times. Put F=FastList(L) in your setup.
>
> But then you're resorting an already-sorted list, which may well have
> different timings (it certainly does in timsort).

Why would it be already sorted? I assume FastList(L) is simply a
wrapper round a normal list that has a modified sort method with the
optimisation included.

Of course, that's essentially the point here - without seeing the
code, we're (to an extent) guessing.
Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Optimizing list.sort() by checking type in advance

2016-10-11 Thread Chris Angelico
On Wed, Oct 12, 2016 at 12:51 AM, Paul Moore  wrote:
> On 11 October 2016 at 14:04, Elliot Gorokhovsky
>  wrote:
>> Right, that sounds good, but there's just one thing I don't understand
>> that's keeping me from using it. Namely, I would define a benchmark list L
>> in my setup, and then I would have code="F=FastList(L);F.fastsort()". The
>> problem here is I'm measuring the constructor time along with the sort time,
>> right, so wouldn't that mess up the benchmark? Or does timeit separate the
>> times?
>
> That would mess up your times. Put F=FastList(L) in your setup.

But then you're resorting an already-sorted list, which may well have
different timings (it certainly does in timsort).

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Optimizing list.sort() by checking type in advance

2016-10-11 Thread Paul Moore
On 11 October 2016 at 14:04, Elliot Gorokhovsky
 wrote:
> Right, that sounds good, but there's just one thing I don't understand
> that's keeping me from using it. Namely, I would define a benchmark list L
> in my setup, and then I would have code="F=FastList(L);F.fastsort()". The
> problem here is I'm measuring the constructor time along with the sort time,
> right, so wouldn't that mess up the benchmark? Or does timeit separate the
> times?

That would mess up your times. Put F=FastList(L) in your setup.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PyWeakref_GetObject() borrows its reference from... whom?

2016-10-11 Thread Larry Hastings


By the way, just to finish playing this new fun game "Who'd I Borrow 
This Reference From?":



On 10/10/2016 11:45 AM, Chris Angelico wrote:

On Mon, Oct 10, 2016 at 8:35 PM, Larry Hastings  wrote:

I bet for every other spot in the API I can tell you from
whom you're borrowing the reference.

Okay. Here's a test:

PyObject* PyList_GetItem(PyObject *list, Py_ssize_t index)
Return value: Borrowed reference.

Presumably you own a reference to the list itself before you call
this, and the list has a reference to its items. But suppose another
thread clear()s the list immediately after you call this; whose
reference are you borrowing now? The list's is gone.


As you say: you've borrowed the reference from the list.  With the GIL 
this is safe.  After the Gilectomy it's not always safe.  It *can* be: 
for example, if you allocated the list locally in the current thread, 
and it hasn't escaped the thread yet, you can assert that in this case 
reference will never go away on you.  But in general this is an API that 
has to change for the Gilectomy.



//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PyWeakref_GetObject() borrows its reference from... whom?

2016-10-11 Thread Devin Jeanpierre
> Huh?  In all other circumstances, a "borrowed" reference is exactly that:
X has a reference, and you are relying on X's reference to keep the object
alive.  Borrowing from a borrowed reference is simply a chain of these; Z
borrows from Y, Y borrows from X, and X is the original person who did the
incref.  But you're borrowing from something specific, somebody who the API
guarantees has a legitimate reference on the object and won't drop it while
you're using it.  I bet for every other spot in the API I can tell you from
whom you're borrowing the reference.
>
> In contrast, the "borrowed" reference returned by PyWeakRef_GetObject()
seems to be "borrowed" from some unspecified entity.  The fact that the
object is live in Python directly implies that, yes, *somebody* must have a
reference, somewhere.  But ISTM (and apparently you) that this is relying
on the GIL preventing that unknown other actor from dropping their
reference while you've borrow it.  A guarantee that the post-Gilectomy
Python interpreter can no longer make!

Let me try again. The only rule for borrowing: x (borrowed reference) is
only guaranteed to be alive for as long as y (source) is guaranteed to be
alive. At least if you phrase it carefully enough, weakrefs still fit the
bill -- your borrowed reference is alive for as long as the weakref is
alive. Of course, the lifetime for a weakref is completely undefined in
GILectomized Python, and barely controllable even in regular CPython. So
this is not a great API.

At the very least we can contort definitions into making it plausible that
we borrowed from the weakref.  If we can only analyze code through the lens
of borrowing, we have no choice except to look at it this way.

> In any case, I see nothing in the documentation that suggests "borrowed
only means unowned" as you suggest.  In contrast, the documentation seems
to suggest that the metaphor is how I understood it; that when you "borrow"
a reference, there is another object who has a reference and you're relying
on their reference to keep the object alive.

This is its own big tangent. Sorry,

Your link is all I have too. It doesn't spell it out. AFAIK there are
exactly two kinds of references discussed anywhere in the docs: owned
references, where you are obligated to call Py_DECREF, and everything else.
Python exclusively uses the term "borrowed references" for that "everything
else". I don't know why. It's a good way to encourage reasonable practices,
I guess.

As the docs themselves note, this metaphor is useful but flawed: e.g. you
can "borrow" the same thing and the original is still usable. But I'd go in
another direction -- the metaphor is sufficient to show safety of some
code, but some safe code exists where we can't as easily use "borrowing" to
talk about why it's safe, and might need to introduce new concepts or
switch tactics.

PyObject* x = PyList_New(0); // x is a new owned reference.
PyObject* y = x;  // y is a borrowed reference.
PyObject* z = x;  // z is also a borrowed reference.
Py_INCREF(y);  // y has been upgraded to an owned reference.
Py_CLEAR(x);  // the original reference z borrowed from disappeared.
// This is provably safe, but you can't use the "borrowing" metaphor for
that proof without introducing new concepts.

do_stuff(z);


// You might wonder, "why would anyone ever do that?!?"
// One reason is because some functions "steal" references, so you need to
borrow before handing off ownership:

// y is an owned reference.
my_struct->foo = y // borrowed a reference to y.
PyTuple_SetItem(my_strict->some_tuple, 0, y); // stole y.
// Now, in a super-strict interpretation, y is no longer "valid", and the
original borrowing relationship has been broken.
// We should ideally reason "as if" it borrowed from my_struct->some_tuple,
even though it didn't.
// (Obviously, this example is still a little overcomplicated, but you get
how this might happen IRL, yeah?
//  e.g. maybe z already existed and the API stealing a reference doesn't
have a GET_ITEM macro.)
// [Note that this has a similar problem to weakref: another thread could
mutate the tuple and delete the object. Yikes!]


I hope those examples made sense.

weakref is playing with fire in a similar way: PyWeakref_GetObject is safe
because someone still exists, and there is a lock that guarantees they
exist as long as you don't release that lock and don't run any code that
might delete a reference.  (In particular, it's guaranteed safe to
immediately upgrade the borrowed reference -- or is without gilectomy.)
 Unlike the stealing tuple example, it isn't clear where the owned
reference is.

So what I mean by saying it's a red herring, is that the correctness of
code doesn't hinge on how easy it is to apply the concept of borrowing, but
exclusively on lifetimes and whether your borrowed reference can be proven
to lie within the object lifetime. If it could not at all be explained
sensibly as a "borrow" from anyone, it can still be right -- that would
only make it 

Re: [Python-Dev] Optimizing list.sort() by checking type in advance

2016-10-11 Thread Paul Moore
On 11 October 2016 at 03:15, Elliot Gorokhovsky
 wrote:
> There's an option to provide setup code, of course, but I need to set up
> before each trial, not just before the loop.

Typically, I would just run the benchmark separately for each case,
and then you'd do

# Case 1
python -m perf timeit -s 'setup; code; here' 'code; to; be; timed; here'
[Results 1]
# Case 2
python -m perf timeit -s 'setup; code; here' 'code; to; be; timed; here'
[Results 2]

The other advantage of doing it this way is that you can post your
benchmark command lines, which will allow people to see what you're
timing, and if there *are* any problems (such as a method lookup that
skews the results) people can point them out.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com