On Thu, Sep 18, 2014 at 10:58 PM, Roy Smith <r...@panix.com> wrote: > I suspect what he meant was "How can I tell if I'm iterating over an > ordered collection?", i.e. iterating over a list vs. iterating over a > set.
Right, which is what I meant by asking if the order mattered. When you iterate over a set, you'll get some kind of order (because iterables have to have an order), but it won't mean anything. > Is there anything which requires an iterator to be deterministic? For > example, let's say I have an iterable, i, and I do: > > list1 = [item for item in i] > list2 = [item for item in i] > > am I guaranteed that list1 == list2? It will be for all the collections > I can think of in the standard library, but if I wrote my own class with > an __iter__() which yielded the items in a non-deterministic order, > would I be violating something other than the principle of least > astonishment? It's not guaranteed. If you do exactly that, with no operations in between, then yes, all the stdlib collections will (AFAIK) give you matching lists, but that's definitely not required by iterator protocol. The one thing you can rely on (and therefore must comply with, when you design an iterable) is that iteration will hit every element exactly once. Implementing that on most collections means returning the values in some internal representational order, something that's consistent across the lifetime of the iterator; having that not be consistent across multiple iterators would be the exception, not the rule. There would be a few special cases where you specifically *want to* have the lists differ, though. Imagine DNS records: perhaps you have four A records for some name, and you want to distribute load between them [1]. You could have your name server do something like this: class _cl_iter: def __init__(self, lst, start): self._data = lst self._start = self._next = start def __iter__(self): return self def __next__(self): if self._next is None: raise StopIteration val = self._data[self._next] self._next = (self._next + 1) % len(self._data) if self._next == self._start: self._next = None return val class CircularList: def __init__(self, it): self._data = list(it) self._next = -1 def __iter__(self): self._next = (self._next + 1) % len(self._data) return _cl_iter(self._data, self._next) Every time you iterate over a given CircularList, you'll get the same results, but starting at a different point: >>> lst = CircularList(("192.0.2.1","192.0.2.2","192.0.2.3","192.0.2.4")) >>> list(lst) ['192.0.2.1', '192.0.2.2', '192.0.2.3', '192.0.2.4'] >>> list(lst) ['192.0.2.2', '192.0.2.3', '192.0.2.4', '192.0.2.1'] >>> list(lst) ['192.0.2.3', '192.0.2.4', '192.0.2.1', '192.0.2.2'] >>> list(lst) ['192.0.2.4', '192.0.2.1', '192.0.2.2', '192.0.2.3'] So if you have a whole bunch of these for your different A records, and you return them in iteration order to each client, you'll end up with different clients getting them in a different order, with minimal extra storage space. (This is Py3 code; to make it Py2 compatible, probably all you need to do is subclass object and rename __next__ to next.) But this is a pretty unusual case. I would expect that most objects will either iterate consistently until mutated, or return only what wasn't previously consumed (like an iterator, which is itself iterable). ChrisA [1] This isn't true load-balancing, of course, but it's a simple way to distribute requests a bit. It's better than nothing. -- https://mail.python.org/mailman/listinfo/python-list