[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-01 Thread Steven D'Aprano
On Sat, Aug 01, 2020 at 08:54:16PM +0300, Ram Rachum wrote:

> When writing some code now, I needed to produce a shuffled version of
> `range(10, 10 ** 5)`.
> 
> This is one way to do it:
> 
> shuffled_numbers = list(range(10, 10 ** 5))
> random.shuffle(shuffled_numbers)
> 
> 
> I don't like it because (1) it's too imperative and (2) I'm calling the
> list "shuffled" even before it's shuffled.

This is easily solved with a three-line helper:

def shuffled(iterable):
L = list(iterable)
random.shuffle(L)
return L

I have implemented this probably a half a dozen times, and I expect 
others have too. I agree with Alex that this would make a nice addition 
to the random module.


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/47JMNMYPEETQFKPDK4OVLGM2IXCQ4GIA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Steven D'Aprano
On Sat, Aug 01, 2020 at 07:25:42PM +0100, Stestagg wrote:

> Irrespective of where in the api this logic should exist, the
> implementation won't be algorithmically different, (I think, even with a
> `.ordered` view, as the view would have to cope with changes to the
> underlying dictionary over its lifetime, and external tracking of changes
> to dicts is not, afaik, feasible. Unlike for-loop constructs which are
> inherently scoped, I feel like you wouldn't get away with forbidding
> modifying a dict() if there's a view on keys/values/items still alive, as
> these things are first-class objects that can be stored/passed around)

Forbidding mutation of the dict while a view exists is missing the point 
of having a view in the first place: updating the owning object should 
update the view as well.



-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/B4TVRQVWEKMA7F2LSPJEOMNMR5XFVTTV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Wes Turner
Is there any reason that these features couldn't be added to OrderedDict
(which is a linked list)?
https://github.com/python/cpython/blob/master/Objects/odictobject.c

On Sat, Aug 1, 2020, 9:13 PM Inada Naoki  wrote:

> On Sun, Aug 2, 2020 at 2:34 AM Christopher Barker 
> wrote:
> >
> > On Sat, Aug 1, 2020 at 2:28 AM Marco Sulla 
> wrote:
> >>
> >> On Sat, 1 Aug 2020 at 03:00, Inada Naoki 
> wrote:
> >>>
> >>> Please teach me if you know any algorithm which has no hole, O(1)
> >>> deletion, preserving insertion order, and efficient and fast as array.
> >
> >
> > I would think the goal here would be to re-order once in a while to
> remove the holes. But that would take time, of course, so you wouldn't want
> to do it on every deletion. But when?
> >
> > One option: maybe too specialized, but it could re-pack the array when
> an indexing operation is made -- since that operation is O(N) anyway. And
> that would then address the issue of performance for multiple indexing
> operations -- if you made a bunch of indexing operation in a row without
> deleting (which would be the case, if this is an alternative to making a
> copy in a Sequence first), then the first one would repack the internal
> array (presumably faster than making a copy) and the rest would have O(1)
> access.
> >
>
> Repacking is mutation, and mutating dict while iterating it breaks the
> iterator.
> But `d.items()[42]` don't looks like mutation.
>
> > Given that this use case doesn't appear to be very important, I doubt
> it's worth it, but it seems it would be possible.
> >
> > Another thought -- could the re-packing happen whenever the entire dict
> is iterated through? Though maybe there's no way to know when that's going
> to happen -- all you get are the individual calls for the next one, yes?
> >
>
> You are right. it couldn't.
>
>
> --
> Inada Naoki  
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/ED2GRWD4RARR2LGP45PK4M6R3MLTAF75/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/W2KBWMS5A2UJV7OYUNABVMHPC6A6JFDF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Inada Naoki
On Sun, Aug 2, 2020 at 2:34 AM Christopher Barker  wrote:
>
> On Sat, Aug 1, 2020 at 2:28 AM Marco Sulla  
> wrote:
>>
>> On Sat, 1 Aug 2020 at 03:00, Inada Naoki  wrote:
>>>
>>> Please teach me if you know any algorithm which has no hole, O(1)
>>> deletion, preserving insertion order, and efficient and fast as array.
>
>
> I would think the goal here would be to re-order once in a while to remove 
> the holes. But that would take time, of course, so you wouldn't want to do it 
> on every deletion. But when?
>
> One option: maybe too specialized, but it could re-pack the array when an 
> indexing operation is made -- since that operation is O(N) anyway. And that 
> would then address the issue of performance for multiple indexing operations 
> -- if you made a bunch of indexing operation in a row without deleting (which 
> would be the case, if this is an alternative to making a copy in a Sequence 
> first), then the first one would repack the internal array (presumably faster 
> than making a copy) and the rest would have O(1) access.
>

Repacking is mutation, and mutating dict while iterating it breaks the iterator.
But `d.items()[42]` don't looks like mutation.

> Given that this use case doesn't appear to be very important, I doubt it's 
> worth it, but it seems it would be possible.
>
> Another thought -- could the re-packing happen whenever the entire dict is 
> iterated through? Though maybe there's no way to know when that's going to 
> happen -- all you get are the individual calls for the next one, yes?
>

You are right. it couldn't.


-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ED2GRWD4RARR2LGP45PK4M6R3MLTAF75/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: How to prevent shared memory from being corrupted ?

2020-08-01 Thread Wes Turner
https://docs.dask.org/en/latest/shared.html#known-limitations :

> Known Limitations
> The shared memory scheduler has some notable limitations:
>
> - It works on a single machine
> - The threaded scheduler is limited by the GIL on Python code, so if your
operations are pure python functions, you should not expect a multi-core
speedup
> - The multiprocessing scheduler must serialize functions between workers,
which can fail
> - The multiprocessing scheduler must serialize data between workers and
the central process, which can be expensive
> - The multiprocessing scheduler cannot transfer data directly between
worker processes; all data routes through the master process.

...
https://distributed.dask.org/en/latest/memory.html#difference-with-dask-compute

(... https://github.com/dask/dask-labextension )

On Sat, Aug 1, 2020 at 7:34 PM Wes Turner  wrote:

> PyArrow Plasma object ids, "sealing" makes an object immutable, pyristent
>
> https://arrow.apache.org/docs/python/plasma.html#object-ids
> https://arrow.apache.org/docs/python/plasma.html#creating-an-object-buffer
>
> > Objects are created in Plasma in two stages. First, they are created,
> which allocates a buffer for the object. At this point, the client can
> write to the buffer and construct the object within the allocated buffer.
> >
> > To create an object for Plasma, you need to create an object ID, as well
> as give the object’s maximum size in bytes.
> > ```python
> > # Create an object buffer.
> > object_id = plasma.ObjectID(20 * b"a")
> > object_size = 1000
> > buffer = memoryview(client.create(object_id, object_size))
> >
> > # Write to the buffer.
> > for i in range(1000):
> >   buffer[i] = i % 128
> > ```
> >
> > When the client is done, the client seals the buffer, making the object
> immutable, and making it available to other Plasma clients.
> >
> > ```python
> > # Seal the object. This makes the object immutable and available to
> other clients.
> > client.seal(object_id)
> > ```
>
> https://pypi.org/project/pyrsistent/ also supports immutable structures
>
> On Sat, Aug 1, 2020 at 4:44 PM Eric V. Smith  wrote:
>
>> On 8/1/2020 1:25 PM, Marco Sulla wrote:
>> > You don't need locks with immutable objects. Since they're immutable,
>> > any operation that usually will mutate the object, generate another
>> > immutable instead. The most common example is str: the sum of two
>> > strings in Python (and in many other languages) produces a new string.
>>
>> While they're immutable at the Python level, strings (and all other
>> objects) are mutated at the C level, due to reference count updates. You
>> need to consider this if you're sharing objects without locking or other
>> synchronization.
>>
>> Eric
>>
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/FEJEHFKBK7TMH6KIYJBPLBYBDU4IA4EB/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IRDFSJP7CIQRPQQEP54T42HN33BUOOOV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: How to prevent shared memory from being corrupted ?

2020-08-01 Thread Wes Turner
PyArrow Plasma object ids, "sealing" makes an object immutable, pyristent

https://arrow.apache.org/docs/python/plasma.html#object-ids
https://arrow.apache.org/docs/python/plasma.html#creating-an-object-buffer

> Objects are created in Plasma in two stages. First, they are created,
which allocates a buffer for the object. At this point, the client can
write to the buffer and construct the object within the allocated buffer.
>
> To create an object for Plasma, you need to create an object ID, as well
as give the object’s maximum size in bytes.
> ```python
> # Create an object buffer.
> object_id = plasma.ObjectID(20 * b"a")
> object_size = 1000
> buffer = memoryview(client.create(object_id, object_size))
>
> # Write to the buffer.
> for i in range(1000):
>   buffer[i] = i % 128
> ```
>
> When the client is done, the client seals the buffer, making the object
immutable, and making it available to other Plasma clients.
>
> ```python
> # Seal the object. This makes the object immutable and available to other
clients.
> client.seal(object_id)
> ```

https://pypi.org/project/pyrsistent/ also supports immutable structures

On Sat, Aug 1, 2020 at 4:44 PM Eric V. Smith  wrote:

> On 8/1/2020 1:25 PM, Marco Sulla wrote:
> > You don't need locks with immutable objects. Since they're immutable,
> > any operation that usually will mutate the object, generate another
> > immutable instead. The most common example is str: the sum of two
> > strings in Python (and in many other languages) produces a new string.
>
> While they're immutable at the Python level, strings (and all other
> objects) are mutated at the C level, due to reference count updates. You
> need to consider this if you're sharing objects without locking or other
> synchronization.
>
> Eric
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/FEJEHFKBK7TMH6KIYJBPLBYBDU4IA4EB/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DNBGUJHDH4UTPSETMFFWMJHNXQXIWX4I/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Wes Turner
first()
https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.first
https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.first

last()
https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.last
https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.last

take()
https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.take
https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.take

tail()
https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.tail
https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.tail

... pluck(ind, seqs, default='__no__default__')
> plucks an element or several elements from each item in a sequence.
https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.pluck

On Sat, Aug 1, 2020, 4:59 PM Guido van Rossum  wrote:

> Yeah, it is totally doable to refactor the collection ABCs to have
> something in between `Collection` and `Sequence` that just supports
> `__getitem__`.
>
> But I would take Marco's research (and Inada's musings) seriously -- we
> don't actually want to support `__getitem__`, because of the unpredictable
> performance characteristics.
>
> I'm no longer in favor of adding .ordered() -- I think it's better to add
> something to itertools, for example first() to get the first item (see Tim
> Peters' post), and something related to get the first N items.
>
> On Sat, Aug 1, 2020 at 12:28 PM Christopher Barker 
> wrote:
>
>> On Fri, Jul 31, 2020 at 7:34 AM Guido van Rossum 
>> wrote:
>>
>>> So maybe we need to add dict.ordered() which returns a view on the items
>>> that is a Sequence rather than a set? Or ordereditems(), orderedkeys() and
>>> orderedvalues()?
>>>
>>
>> I'm still confused as to when "ordered" became synonymous with "Sequence"
>> -- so wouldn't we want to call these dict.as_sequence() or something like
>> that?
>>
>> And is there a reason that the regular dict views couldn't be both a Set
>> and a Sequence? Looking at the ABCs, I don't see a conflict -- __getitem__,
>> index() and count() would need to be added, and  Set's don't have any of
>> those. (and count could be optimized to always return 0 or 1 for
>> dict.keys() ;-) )
>>
>> But anyway, naming aside, I'm still wondering whether we necessarily want
>> the entire Sequence protocol. For the use cases at hand, isn't indexing and
>> slicing enough?
>>
>> Which brings us to the philosophy of duck typing. I wrote an earlier post
>> about that -- so here's some follow up thoughts. I suggested that I like
>> the "if I only need it to quack, I don't care if it's a duck" approach -- I
>> try to use the quack() method, and I'm happy it if works, and raise an
>> Exception (Or let whatever Exception happens be raised bubble up) if it
>> doesn't.
>>
>> Guido pointed out that having a quack() method isn't enough -- it also
>> needs to actually behave as you expect -- which is the nice thing about
>> ABCs -- if you know something is a Sequence, you don't just know that you
>> can index it, you know that indexing it will do what you expect.
>>
>> Which brings us back to the random.choice() function. It's really simple,
>> and uses exactly the approach I outlined above.
>>
>> def choice(self, seq):
>> """Choose a random element from a non-empty sequence."""
>> try:
>> i = self._randbelow(len(seq))
>> except ValueError:
>> raise IndexError('Cannot choose from an empty sequence') from
>> None
>> return seq[i]
>>
>> It checks the length of the object, picks a random index within that
>> length, and then tries to use that index to get a random item. so anything
>> with a __len__ and a __getitem__ that accepts integers will work.
>>
>> And this has worked "fine" for decades. Should it be checking that seq is
>> actually a sequence? I don't think so -- I like that I can pass in any
>> object that's indexable by an integer.
>>
>> But there's is a potential problem here -- all it does is try to pass an
>> integer to __getitem__. So all Sequences should work. But Mappings also
>> have a __getitem__, but with slightly different semantics -- a Sequence
>> should accept an integer (or object with an __index__) in the range of its
>> size, but a Mapping can accept any valid key. So for the most part, passing
>> a Mapping to random.choice() fails as it should, with a KeyError. But if
>> you happen to have a key that is an integer, it might succeed, but it would
>> not be doing "the right thing" (unless the Mapping happened to be
>> constructed exactly the right way -- but then it should probably just be a
>> Sequence).
>>
>> So: do we need a solution to this? I don't think so, it's simply the
>> nature of a dynamic typing as far as I'm concerned, but if we wanted it to
>> be more robust, we could require (maybe only with a static type
>> declaration) that the object passed in is a Sequence.
>>
>> But I think that would be a shame -- this function doesn't need a 

[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-01 Thread Neil Girdhar
Can you not just use 
https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.random_permutation
 ?

On Saturday, August 1, 2020 at 2:26:23 PM UTC-4 Ram Rachum wrote:

> I would also prefer a `random.shuffled` function. The reason I didn't 
> propose it is because there's usually more resistance for adding new 
> functions. But in my view that'll be the best solution.
>
> On Sat, Aug 1, 2020 at 9:17 PM Alex Hall  wrote:
>
>> I agree that calling random.shuffle imperatively is annoying. But I don't 
>> think your proposed solution is readable. You're not taking a sample. A 
>> sample generally implies a strict subset, usually quite a small one.
>>
>> I've often thought there should just be a `random.shuffled()` function 
>> which returns a shuffled copy, similar to `.sort()` and `sorted()` or 
>> `.reverse()` and `reversed()`.
>>
>> On Sat, Aug 1, 2020 at 7:59 PM Ram Rachum  wrote:
>>
>>> When writing some code now, I needed to produce a shuffled version of 
>>> `range(10, 10 ** 5)`.
>>>
>>> This is one way to do it: 
>>>
>>> shuffled_numbers = list(range(10, 10 ** 5))
>>> random.shuffle(shuffled_numbers)
>>>
>>>
>>> I don't like it because (1) it's too imperative and (2) I'm calling the 
>>> list "shuffled" even before it's shuffled.
>>>
>>> Another solution is this: 
>>>
>>> shuffled_numbers = random.sample(range(10, 10 ** 5), k=len(range(10, 10 
>>> ** 5)))
>>>
>>> This is better because it solves the 2 points above. However, it is 
>>> quite cumbersome.
>>>
>>> I notice that the `random.sample` function doesn't have a default 
>>> behavior set when you don't specify `k`. This is fortunate, because we 
>>> could make that behavior just automatically take the length of the first 
>>> argument. So we could do this: 
>>>
>>> shuffled_numbers = random.sample(range(10, 10 ** 5))
>>>
>>> What do you think? 
>>>
>>>
>>> Thanks,
>>> Ram.
>>> ___
>>> Python-ideas mailing list -- python...@python.org
>>> To unsubscribe send an email to python-id...@python.org
>>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>>> Message archived at 
>>> https://mail.python.org/archives/list/python...@python.org/message/OHLXVKIBMNSQO6BCFK6LEHSYNXDB6OQJ/
>>>  
>>> 
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RG346EKJMPYZAI6PHCCZRKOIJUIML3HB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Guido van Rossum
Yeah, it is totally doable to refactor the collection ABCs to have
something in between `Collection` and `Sequence` that just supports
`__getitem__`.

But I would take Marco's research (and Inada's musings) seriously -- we
don't actually want to support `__getitem__`, because of the unpredictable
performance characteristics.

I'm no longer in favor of adding .ordered() -- I think it's better to add
something to itertools, for example first() to get the first item (see Tim
Peters' post), and something related to get the first N items.

On Sat, Aug 1, 2020 at 12:28 PM Christopher Barker 
wrote:

> On Fri, Jul 31, 2020 at 7:34 AM Guido van Rossum  wrote:
>
>> So maybe we need to add dict.ordered() which returns a view on the items
>> that is a Sequence rather than a set? Or ordereditems(), orderedkeys() and
>> orderedvalues()?
>>
>
> I'm still confused as to when "ordered" became synonymous with "Sequence"
> -- so wouldn't we want to call these dict.as_sequence() or something like
> that?
>
> And is there a reason that the regular dict views couldn't be both a Set
> and a Sequence? Looking at the ABCs, I don't see a conflict -- __getitem__,
> index() and count() would need to be added, and  Set's don't have any of
> those. (and count could be optimized to always return 0 or 1 for
> dict.keys() ;-) )
>
> But anyway, naming aside, I'm still wondering whether we necessarily want
> the entire Sequence protocol. For the use cases at hand, isn't indexing and
> slicing enough?
>
> Which brings us to the philosophy of duck typing. I wrote an earlier post
> about that -- so here's some follow up thoughts. I suggested that I like
> the "if I only need it to quack, I don't care if it's a duck" approach -- I
> try to use the quack() method, and I'm happy it if works, and raise an
> Exception (Or let whatever Exception happens be raised bubble up) if it
> doesn't.
>
> Guido pointed out that having a quack() method isn't enough -- it also
> needs to actually behave as you expect -- which is the nice thing about
> ABCs -- if you know something is a Sequence, you don't just know that you
> can index it, you know that indexing it will do what you expect.
>
> Which brings us back to the random.choice() function. It's really simple,
> and uses exactly the approach I outlined above.
>
> def choice(self, seq):
> """Choose a random element from a non-empty sequence."""
> try:
> i = self._randbelow(len(seq))
> except ValueError:
> raise IndexError('Cannot choose from an empty sequence') from
> None
> return seq[i]
>
> It checks the length of the object, picks a random index within that
> length, and then tries to use that index to get a random item. so anything
> with a __len__ and a __getitem__ that accepts integers will work.
>
> And this has worked "fine" for decades. Should it be checking that seq is
> actually a sequence? I don't think so -- I like that I can pass in any
> object that's indexable by an integer.
>
> But there's is a potential problem here -- all it does is try to pass an
> integer to __getitem__. So all Sequences should work. But Mappings also
> have a __getitem__, but with slightly different semantics -- a Sequence
> should accept an integer (or object with an __index__) in the range of its
> size, but a Mapping can accept any valid key. So for the most part, passing
> a Mapping to random.choice() fails as it should, with a KeyError. But if
> you happen to have a key that is an integer, it might succeed, but it would
> not be doing "the right thing" (unless the Mapping happened to be
> constructed exactly the right way -- but then it should probably just be a
> Sequence).
>
> So: do we need a solution to this? I don't think so, it's simply the
> nature of a dynamic typing as far as I'm concerned, but if we wanted it to
> be more robust, we could require (maybe only with a static type
> declaration) that the object passed in is a Sequence.
>
> But I think that would be a shame -- this function doesn't need a full
> Sequence, it only needs a Sized and __getitem__.
>
> In fact, the ABCs are designed to accommodate much of this -- for example,
> the Sized ABC only requires one feature: __len__. And Contains only
> __contains__. As far as I know there are no built-ins (or commonly used
> third party) objects that are ONLY Sized, or ONLY Contains. In fact, at
> least in the collection.abc, every ABC that has __contains__ also has
> __len__. And I can't think of anything that could support "in" that didn't
> have a size -- which could be a failure of imagination on my part. But you
> could type check for Contains is all you wanted to do was know that you
> could use it with "in".
>
> So there are ABCs there simply to support a single method. Which means
> that we could solve the "problem" of random.choice with a "Getitemable"
> ABC.
>
> Ahh -- but here's the rub -- while the ABCs only require certain methods
> -- in fact, it's implied that they have 

[Python-ideas] Re: How to prevent shared memory from being corrupted ?

2020-08-01 Thread Eric V. Smith

On 8/1/2020 1:25 PM, Marco Sulla wrote:
You don't need locks with immutable objects. Since they're immutable, 
any operation that usually will mutate the object, generate another 
immutable instead. The most common example is str: the sum of two 
strings in Python (and in many other languages) produces a new string.


While they're immutable at the Python level, strings (and all other 
objects) are mutated at the C level, due to reference count updates. You 
need to consider this if you're sharing objects without locking or other 
synchronization.


Eric

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FEJEHFKBK7TMH6KIYJBPLBYBDU4IA4EB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Christopher Barker
On Fri, Jul 31, 2020 at 7:34 AM Guido van Rossum  wrote:

> So maybe we need to add dict.ordered() which returns a view on the items
> that is a Sequence rather than a set? Or ordereditems(), orderedkeys() and
> orderedvalues()?
>

I'm still confused as to when "ordered" became synonymous with "Sequence"
-- so wouldn't we want to call these dict.as_sequence() or something like
that?

And is there a reason that the regular dict views couldn't be both a Set
and a Sequence? Looking at the ABCs, I don't see a conflict -- __getitem__,
index() and count() would need to be added, and  Set's don't have any of
those. (and count could be optimized to always return 0 or 1 for
dict.keys() ;-) )

But anyway, naming aside, I'm still wondering whether we necessarily want
the entire Sequence protocol. For the use cases at hand, isn't indexing and
slicing enough?

Which brings us to the philosophy of duck typing. I wrote an earlier post
about that -- so here's some follow up thoughts. I suggested that I like
the "if I only need it to quack, I don't care if it's a duck" approach -- I
try to use the quack() method, and I'm happy it if works, and raise an
Exception (Or let whatever Exception happens be raised bubble up) if it
doesn't.

Guido pointed out that having a quack() method isn't enough -- it also
needs to actually behave as you expect -- which is the nice thing about
ABCs -- if you know something is a Sequence, you don't just know that you
can index it, you know that indexing it will do what you expect.

Which brings us back to the random.choice() function. It's really simple,
and uses exactly the approach I outlined above.

def choice(self, seq):
"""Choose a random element from a non-empty sequence."""
try:
i = self._randbelow(len(seq))
except ValueError:
raise IndexError('Cannot choose from an empty sequence') from
None
return seq[i]

It checks the length of the object, picks a random index within that
length, and then tries to use that index to get a random item. so anything
with a __len__ and a __getitem__ that accepts integers will work.

And this has worked "fine" for decades. Should it be checking that seq is
actually a sequence? I don't think so -- I like that I can pass in any
object that's indexable by an integer.

But there's is a potential problem here -- all it does is try to pass an
integer to __getitem__. So all Sequences should work. But Mappings also
have a __getitem__, but with slightly different semantics -- a Sequence
should accept an integer (or object with an __index__) in the range of its
size, but a Mapping can accept any valid key. So for the most part, passing
a Mapping to random.choice() fails as it should, with a KeyError. But if
you happen to have a key that is an integer, it might succeed, but it would
not be doing "the right thing" (unless the Mapping happened to be
constructed exactly the right way -- but then it should probably just be a
Sequence).

So: do we need a solution to this? I don't think so, it's simply the nature
of a dynamic typing as far as I'm concerned, but if we wanted it to be more
robust, we could require (maybe only with a static type declaration) that
the object passed in is a Sequence.

But I think that would be a shame -- this function doesn't need a full
Sequence, it only needs a Sized and __getitem__.

In fact, the ABCs are designed to accommodate much of this -- for example,
the Sized ABC only requires one feature: __len__. And Contains only
__contains__. As far as I know there are no built-ins (or commonly used
third party) objects that are ONLY Sized, or ONLY Contains. In fact, at
least in the collection.abc, every ABC that has __contains__ also has
__len__. And I can't think of anything that could support "in" that didn't
have a size -- which could be a failure of imagination on my part. But you
could type check for Contains is all you wanted to do was know that you
could use it with "in".

So there are ABCs there simply to support a single method. Which means that
we could solve the "problem" of random.choice with a "Getitemable" ABC.

Ahh -- but here's the rub -- while the ABCs only require certain methods --
in fact, it's implied that they have particular behavior as well. And this
is the problem at hand. Both Sequences and Mappings have a __getitem__, but
they have somewhat different meanings, and that meaning is embedded in the
ABC itself, rather than the method: Sequences will take an integer, and
raise a IndexError if its out of range, and Mappings take any hashable, and
will raise a KeyError if it's not there.

So maybe what is needed is an Indexable ABC that implies the Sequence-like
indexing behavior.

Then if we added indexing to dict views, they would be an Indexable, but
not a Sequence.

-CHB











> On Fri, Jul 31, 2020 at 05:29 Ricky Teachey  wrote:
>
>> On Fri, Jul 31, 2020, 2:48 AM Wes Turner  wrote:
>>
>>> # Dicts and DataFrames
>>> - Src:
>>> 

[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Stestagg
I wrote some (better than the previously shared) benchmarks for this change
a while ago.  They are run on cPython with a patch applied that implements
dict_views __getitem__() using a method similar to `lookdict` to perform
indexing on keys/values/etc.

Irrespective of where in the api this logic should exist, the
implementation won't be algorithmically different, (I think, even with a
`.ordered` view, as the view would have to cope with changes to the
underlying dictionary over its lifetime, and external tracking of changes
to dicts is not, afaik, feasible. Unlike for-loop constructs which are
inherently scoped, I feel like you wouldn't get away with forbidding
modifying a dict() if there's a view on keys/values/items still alive, as
these things are first-class objects that can be stored/passed around)

Therefore, all index based lookups would have to use a variant of this
logic (unless someone can come up with a magic O(1) solution ;) Or explicit
compaction is used (If anyone has a patch that adds tracking 'compactness'
over the dict_keys, I can run the tests using it, to measure the impact -
However I'm not personally sure yet if the overheads of this more invasive
change are justified just for enabling indexing).

The cPython patch can be found here:
https://github.com/stestagg/dict_index/blob/master/changes.patch, and the
benchmark results are linked below.

The tl/dr from my perspective is that these results make the change
challenging to continue proposing without a better implementation than the
obvious one. (I was weakly +1 before these results).  Personally, I'm happy
that the numbers give good evidence for this change being more complex than
it at-first seems.

Some notes about the benchmarks, I've adapted an existing, not-related,
test runner for this, so there may be some compromises.  I've tried to be
reasonable about capturing OK timing data, but the intent here isn't to
spot single-% changes in performance, rather it's looking at significant
changes in runtime performance over vastly varying sizes of dicts.  The
repo including the test runner, patches and makefile are here:
https://github.com/stestagg/dict_index and I'm accepting issues/PRs there
if anyone feels that there's an omission or error that's worth correcting.

The numbers are raw, and *do not* have any interpretation layered on them,
there are many snippets of code that are not best-practice or ideal ways of
achieving things, this is as much because I wanted to see what the impact
of these non-optimal patterns would be on common operations, please take
the time to understand the exact test (check the full source if you need)
before making any meaningful decisions based on this data.

Graphs are, by default, plotted on log-log axes, so beware when just
looking at the shapes of the lines that the real-world difference in
run-time is much larger than the absolute line differences suggest.  The
solution that uses direct indexing is always coloured green in the graphs.

As the code involves a tight loop over a simple structure which is very CPU
dependent (and because I can), I've run the benchmarks on a Raspberry pi4
(ARMv7l), and on an AMD pc:

ARM Results:
https://stestagg.github.io/dict_index/pi4.html

PC Results:
https://stestagg.github.io/dict_index/pc.html

Thanks

Steve


On Sat, Aug 1, 2020 at 10:25 AM Marco Sulla 
wrote:

> On Sat, 1 Aug 2020 at 03:00, Inada Naoki  wrote:
>
>> Please teach me if you know any algorithm which has no hole, O(1)
>> deletion, preserving insertion order, and efficient and fast as array.
>>
>
> :)
>
> About the hole, I was thinking that in theory the problem can be
> circumvented using a modified version of lookdict.
> lookdict searches for a key and returns its position in the ma_keys array.
> I suppose it's possible to do the contrary: search for the index and return
> the key.
> What do you think (theoretically speaking)?
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/THHMGFINOJAOHQQTRUBHYKWRQZLEJ7OZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-01 Thread Ram Rachum
I would also prefer a `random.shuffled` function. The reason I didn't
propose it is because there's usually more resistance for adding new
functions. But in my view that'll be the best solution.

On Sat, Aug 1, 2020 at 9:17 PM Alex Hall  wrote:

> I agree that calling random.shuffle imperatively is annoying. But I don't
> think your proposed solution is readable. You're not taking a sample. A
> sample generally implies a strict subset, usually quite a small one.
>
> I've often thought there should just be a `random.shuffled()` function
> which returns a shuffled copy, similar to `.sort()` and `sorted()` or
> `.reverse()` and `reversed()`.
>
> On Sat, Aug 1, 2020 at 7:59 PM Ram Rachum  wrote:
>
>> When writing some code now, I needed to produce a shuffled version of
>> `range(10, 10 ** 5)`.
>>
>> This is one way to do it:
>>
>> shuffled_numbers = list(range(10, 10 ** 5))
>> random.shuffle(shuffled_numbers)
>>
>>
>> I don't like it because (1) it's too imperative and (2) I'm calling the
>> list "shuffled" even before it's shuffled.
>>
>> Another solution is this:
>>
>> shuffled_numbers = random.sample(range(10, 10 ** 5), k=len(range(10, 10
>> ** 5)))
>>
>> This is better because it solves the 2 points above. However, it is quite
>> cumbersome.
>>
>> I notice that the `random.sample` function doesn't have a default
>> behavior set when you don't specify `k`. This is fortunate, because we
>> could make that behavior just automatically take the length of the first
>> argument. So we could do this:
>>
>> shuffled_numbers = random.sample(range(10, 10 ** 5))
>>
>> What do you think?
>>
>>
>> Thanks,
>> Ram.
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/OHLXVKIBMNSQO6BCFK6LEHSYNXDB6OQJ/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RFJHHVAWYAMSRRD5ZYGH7VHAOKMNAR67/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-01 Thread Alex Hall
I agree that calling random.shuffle imperatively is annoying. But I don't
think your proposed solution is readable. You're not taking a sample. A
sample generally implies a strict subset, usually quite a small one.

I've often thought there should just be a `random.shuffled()` function
which returns a shuffled copy, similar to `.sort()` and `sorted()` or
`.reverse()` and `reversed()`.

On Sat, Aug 1, 2020 at 7:59 PM Ram Rachum  wrote:

> When writing some code now, I needed to produce a shuffled version of
> `range(10, 10 ** 5)`.
>
> This is one way to do it:
>
> shuffled_numbers = list(range(10, 10 ** 5))
> random.shuffle(shuffled_numbers)
>
>
> I don't like it because (1) it's too imperative and (2) I'm calling the
> list "shuffled" even before it's shuffled.
>
> Another solution is this:
>
> shuffled_numbers = random.sample(range(10, 10 ** 5), k=len(range(10, 10 **
> 5)))
>
> This is better because it solves the 2 points above. However, it is quite
> cumbersome.
>
> I notice that the `random.sample` function doesn't have a default behavior
> set when you don't specify `k`. This is fortunate, because we could make
> that behavior just automatically take the length of the first argument. So
> we could do this:
>
> shuffled_numbers = random.sample(range(10, 10 ** 5))
>
> What do you think?
>
>
> Thanks,
> Ram.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/OHLXVKIBMNSQO6BCFK6LEHSYNXDB6OQJ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UMPJOTCY4SK5LFBPWFBJYFXLBF76EA2S/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Default behavior for random.sample when no k

2020-08-01 Thread Ram Rachum
When writing some code now, I needed to produce a shuffled version of
`range(10, 10 ** 5)`.

This is one way to do it:

shuffled_numbers = list(range(10, 10 ** 5))
random.shuffle(shuffled_numbers)


I don't like it because (1) it's too imperative and (2) I'm calling the
list "shuffled" even before it's shuffled.

Another solution is this:

shuffled_numbers = random.sample(range(10, 10 ** 5), k=len(range(10, 10 **
5)))

This is better because it solves the 2 points above. However, it is quite
cumbersome.

I notice that the `random.sample` function doesn't have a default behavior
set when you don't specify `k`. This is fortunate, because we could make
that behavior just automatically take the length of the first argument. So
we could do this:

shuffled_numbers = random.sample(range(10, 10 ** 5))

What do you think?


Thanks,
Ram.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OHLXVKIBMNSQO6BCFK6LEHSYNXDB6OQJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Tim Peters
[Steven D'Aprano ]
>> 
>> The other simple solution is `next(iter(mydict.items()))`.

[Guido]
> That one always makes me uncomfortable, because the StopIteration it
> raises when the dict is empty might be misinterpreted. Basically I never
> want to call next() unless there's a try...except StopIteration: around it,
> and that makes this a lot less simple.

Last time this came up, this appeared to reach near-consensus:

"""
exactly what more-itertools has supplied for years already :-)

If the iterable is empty/exhausted, by default ValueError is raised,
but that can be overridden by also passing an optional argument to
return instead (like dict.pop() in this respect).

So, e.g.,

first([42]) returns 42
first([]) raises ValueError
first([], 42) and first([], default=42) return 42

I don't think it belongs in the builtins.  It doesn't perfectly fit
anywhere, but given its prior history in the more-itertools and
itertoolz packages, Python's itertools package seems the least
annoying ;-) home for it.
|"""
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OKUKAX6YE54KCKRV5OIP3XX4J2U6U3WC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Christopher Barker
On Sat, Aug 1, 2020 at 2:28 AM Marco Sulla 
wrote:

> On Sat, 1 Aug 2020 at 03:00, Inada Naoki  wrote:
>
>> Please teach me if you know any algorithm which has no hole, O(1)
>> deletion, preserving insertion order, and efficient and fast as array.
>>
>
I would think the goal here would be to re-order once in a while to remove
the holes. But that would take time, of course, so you wouldn't want to do
it on every deletion. But when?

One option: maybe too specialized, but it could re-pack the array when an
indexing operation is made -- since that operation is O(N) anyway. And that
would then address the issue of performance for multiple indexing
operations -- if you made a bunch of indexing operation in a row without
deleting (which would be the case, if this is an alternative to making a
copy in a Sequence first), then the first one would repack the internal
array (presumably faster than making a copy) and the rest would have O(1)
access.

Given that this use case doesn't appear to be very important, I doubt it's
worth it, but it seems it would be possible.

Another thought -- could the re-packing happen whenever the entire dict is
iterated through? Though maybe there's no way to know when that's going to
happen -- all you get are the individual calls for the next one, yes?

> About the hole, I was thinking that in theory the problem can be
circumvented using a modified version of lookdict.
>
> lookdict searches for a key and returns its position in the ma_keys array.
> I suppose it's possible to do the contrary: search for the index and return
> the key.
> What do you think (theoretically speaking)?
>

but isn't searching for the index going to require iterating through the
array until you find it? i.e. that O(N) operation we're trying to avoid?

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KGRYRDCLFIH3PAEOZ7HFFIN4SLV5KHIF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: How to prevent shared memory from being corrupted ?

2020-08-01 Thread Marco Sulla
You don't need locks with immutable objects. Since they're immutable, any
operation that usually will mutate the object, generate another immutable
instead. The most common example is str: the sum of two strings in Python
(and in many other languages) produces a new string.

This is usually slower than modifying a mutable object (as atomic types),
but they allow you to remove the bottleneck of a lock.

See also immutables.Map: https://github.com/MagicStack/immutables
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HC5MA4SHEYLLQ7X5KL7C7QWMKKJZPAVB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: How to prevent shared memory from being corrupted ?

2020-08-01 Thread Vinay Sharma via Python-ideas


> On 01-Aug-2020, at 1:31 AM, Marco Sulla  wrote:
> 
> On Thu, 30 Jul 2020 at 12:57, Vinay Sharma via Python-ideas 
> mailto:python-ideas@python.org>> wrote:
> Python has support for atomic types, I guess: 
> Atomic Int: 
> https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic.h#L80
>  
> 
> Atomic Store: 
> https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic.h#L94
>  
> 
>  
> 
> You could also use immutables:
> https://nextjournal.com/schmudde/adventures-in-immutable-python 
> 
Could you please elaborate a bit more on this ?
I think your idea is to store data in Plasma store, but what exactly are you 
suggesting I store ?
As far as I understand plasma store is used to store immutable objects, but 
neither python’s shared_memory API stored immutable objects, not the locking 
mechanism discussed would store immutable locks.


> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/URLHX7IEK6NRCUCN3K647JTDCIRK5ZAT/
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6VHDMDVMPYDCCOJIC5CUJ2PA6IKSV4K6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Marco Sulla
On Sat, 1 Aug 2020 at 03:00, Inada Naoki  wrote:

> Please teach me if you know any algorithm which has no hole, O(1)
> deletion, preserving insertion order, and efficient and fast as array.
>

:)

About the hole, I was thinking that in theory the problem can be
circumvented using a modified version of lookdict.
lookdict searches for a key and returns its position in the ma_keys array.
I suppose it's possible to do the contrary: search for the index and return
the key.
What do you think (theoretically speaking)?
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/57BDPRYVMMALKERYPRJMQO4AH33FWOV4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Steven D'Aprano
On Sat, Aug 01, 2020 at 03:57:34PM +1000, Chris Angelico wrote:

> Ahh, okay. So it's safe in the sense that it can't accidentally leak a
> confusing exception. Unfortunately a lot of people are going to assume
> that it means "will always give a useful return value". Definitely
> worth being very very clear about the semantics.

What you name the function is up to you :-)


> And it's really not the ideal semantics here anyway. What you actually
> want is ValueError if the dict is empty.

Your wish is my command:

mynext = exception_guard(catch=StopIteration, throw=ValueError)(next)


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/B5FIE43QBMMGGFYXDJYMWYB64D4SR6HW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Christopher Barker
On Sat, Aug 1, 2020 at 10:19 AM Wes Turner  wrote:

> > AFAIU, direct subscripting / addressing was not a use case in the design
> phase of the current dict?
>

Nope, -- if it were, it would have presumably been implemented :-)

But order-preserving wasn't really a design goal either, as I understand
it, but rather a side effect of the implementation. As I recall the
conversation, In 3.7, when it was made "official", even then it was less
about how useful it was than that people WILL use it, and will count on it,
even if they are told by the docs that they shouldn't. So we might as well
commit to it. And it is indeed handy now and again.

So the current conversion is the result that once we have order preserving
dicts, maybe we can do a few other things with them, than a use case
driving the decision in the first place.

On Fri, Jul 31, 2020 at 6:35 PM Inada Naoki  wrote:
> There are two major points to optimize.

>
> * Iterating over `next(islice(dict.items(), n, n+1))` will produce n
> temporary tuples.
> * (CPython implementation detail) dict can detect if there is no hole.
> index access is O(1) if there is no hole.
>

Any thoughts on how much of a difference these might make? particularly the
first one. the seconds of course won't help when there are holes, which
would make performance harder to predict.

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LHAZAHFYHPZTDHP3EJNM45LJWWCWQA54/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Inada Naoki
On Sat, Aug 1, 2020 at 12:40 PM Steven D'Aprano  wrote:
>
> On Fri, Jul 31, 2020 at 08:08:58PM -0700, Guido van Rossum wrote:
>
> > > The other simple solution is `next(iter(mydict.items()))`.
> > >
> >
> > That one always makes me uncomfortable, because the StopIteration it raises
> > when the dict is empty might be misinterpreted. Basically I never want to
> > call next() unless there's a try...except StopIteration: around it, and
> > that makes this a lot less simple.
>
> Acknowledged. But there are ways to solve that which perhaps aren't as
> well known as they should be.
>
> * Use a default: `next(iter(mydict.items()), MISSING)`
>
> * Use a helper to convert StopIteration to something else.
>

There is a most simple solution:

* `[first] = mydict.items()`, or `first, = mydict.items()`

Anyway, should we add some tools to itertools, instead of "itertools recipe"?

* `first(iterable, default=None)` -- same to `[first] = iterable`, but
return default value instead of ValueError when iterable is empty.
* `nth(iterable, n, default=None)`
* `consume(iterator, n=None)`

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/75EFRNZQS7FZWVS5BL2RZ73QKA3D4NZR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Chris Angelico
On Sat, Aug 1, 2020 at 3:50 PM Steven D'Aprano  wrote:
>
> On Sat, Aug 01, 2020 at 02:08:08PM +1000, Chris Angelico wrote:
> > On Sat, Aug 1, 2020 at 1:43 PM Steven D'Aprano  wrote:
>
> > > Some years ago, someone (I think it was Nick Coghlan?) proposed a
> > > standard solution for this issue, a context manager + decorator function
> > > that guarded against a specific exception. Nothing much came of it, but
> > > I did experiment with the idea, and got something which you could use
> > > like this:
> > >
> > > with exception_guard(StopIteration):
> > > first = next(iter(mydict.items()))
> >
> > My understanding of this is that 'first' is unassigned if StopIteration 
> > happens.
>
> Sorry for not being more explicit about what was going on. I was stuck
> in my own head and didn't consider that others might not recall the
> discussion from all those many years ago, mea culpa.
>
> The exception guard doesn't merely catch and discard exceptions. It
> re-raises with a new exception, RuntimeError by default.
>
>
> > > or like this:
> > >
> > > safenext = exception_guard(StopIteration)(next)
> > > first = safenext(iter(mydict.items()))
> >
> > My understanding of this is that I am confused. What does safenext return?
>
> Nothing; it raises RuntimeError.
>

Ahh, okay. So it's safe in the sense that it can't accidentally leak a
confusing exception. Unfortunately a lot of people are going to assume
that it means "will always give a useful return value". Definitely
worth being very very clear about the semantics.

And it's really not the ideal semantics here anyway. What you actually
want is ValueError if the dict is empty. There's no easy way to spell
that generically, so it'd want a helper function, which means it would
probably do well as a method.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/G3ZVLH3F3WT3XGEZIPHS3NOYOX633S7R/
Code of Conduct: http://python.org/psf/codeofconduct/