Re: [Python-Dev] itertools addition: getitem()
Giovanni Bajo wrote: On 09/07/2007 21.23, Walter Dörwald wrote: from ll.xist import parsers, xfind from ll.xist.ns import html e = parsers.parseURL(http://www.python.org;, tidy=True) print e.walknode(html.h2 xfind.hasclass(news))[-1] Google Adds Python Support to Google Calendar Developer's Guide Get the first comment line from a python file: getitem((line for line in open(Lib/codecs.py) if line.startswith(#)), 0) '### Registry and builtin stateless codec functions\n' Create a new unused identifier: def candidates(base): ... yield base ... for suffix in count(2): ... yield %s%d % (base, suffix) ... usedids = set((foo, bar)) getitem((i for i in candidates(foo) if i not in usedids), 0) 'foo2' You keep posting examples where you call your getitem() function with 0 as index, or -1. getitem(it, 0) already exists and it's spelled it.next(). getitem(it, -1) might be useful in fact, and it might be spelled last(it) (or it.last()). Then one may want to add first() for simmetry, but that's it: first(i for i in candidates(foo) if i not in usedids) last(line for line in open(Lib/codecs.py) if line[0] == '#') Are there real-world use cases for getitem(it, n) with n not in (0, -1)? I share Raymond's feelings on this. And by the way, if you wonder, I have these exact feelings as well for islice... :) It useful for screen scraping HTML. Suppose you have the following HTML table: table trtd01.01.2007/tdtd12.34/tdtdFoo/td/tr trtd13.01.2007/tdtd23.45/tdtdBar/td/tr trtd04.02.2007/tdtd45.56/tdtdBaz/td/tr trtd27.02.2007/tdtd56.78/tdtdSpam/td/tr trtd17.03.2007/tdtd67.89/tdtdEggs/td/tr trtd /tdtd164.51/tdtdTotal/td/tr trtd /tdtd(incl. VAT)/tdtd/td/tr /table To extract the total sum, you want the second column from the second to last row, i.e. something like: row = getitem((r for r in table if r.name == tr), -2) col = getitem((c for c in row if c.name == td), 1) Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
On 09/07/2007 21.23, Walter Dörwald wrote: from ll.xist import parsers, xfind from ll.xist.ns import html e = parsers.parseURL(http://www.python.org;, tidy=True) print e.walknode(html.h2 xfind.hasclass(news))[-1] Google Adds Python Support to Google Calendar Developer's Guide Get the first comment line from a python file: getitem((line for line in open(Lib/codecs.py) if line.startswith(#)), 0) '### Registry and builtin stateless codec functions\n' Create a new unused identifier: def candidates(base): ... yield base ... for suffix in count(2): ... yield %s%d % (base, suffix) ... usedids = set((foo, bar)) getitem((i for i in candidates(foo) if i not in usedids), 0) 'foo2' You keep posting examples where you call your getitem() function with 0 as index, or -1. getitem(it, 0) already exists and it's spelled it.next(). getitem(it, -1) might be useful in fact, and it might be spelled last(it) (or it.last()). Then one may want to add first() for simmetry, but that's it: first(i for i in candidates(foo) if i not in usedids) last(line for line in open(Lib/codecs.py) if line[0] == '#') Are there real-world use cases for getitem(it, n) with n not in (0, -1)? I share Raymond's feelings on this. And by the way, if you wonder, I have these exact feelings as well for islice... :) -- Giovanni Bajo ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
Raymond Hettinger wrote: [Walter Dörwald] I'd like to propose the following addition to itertools: A function itertools.getitem() which is basically equivalent to the following python code: _default = object() def getitem(iterable, index, default=_default): try: return list(iterable)[index] except IndexError: if default is _default: raise return default but without materializing the complete list. Negative indexes are supported too (this requires additional temporary storage for abs(index) objects). Why not use the existing islice() function? x = list(islice(iterable, i, i+1)) or default This doesn't work, because it produces a list list(islice(xrange(10), 2, 3)) or 42 [2] The following would work: x = (list(islice(iterable, i, i+1)) or [default])[0] However islice() doesn't support negative indexes, getitem() does. Also, as a practical matter, I think it is a bad idea to introduce __getitem__ style access to itertools because the starting point moves with each consecutive access: # access items 0, 2, 5, 9, 14, 20, ... for i in range(10): print getitem(iterable, i) Worse, this behavior changes depending on whether the iterable is re-iterable (a string would yield consecutive items while a generator would skip around as shown above). islice() has the same problem: from itertools import * iterable = iter(xrange(100)) for i in range(10): ... print list(islice(iterable, i, i+1)) [0] [2] [5] [9] [14] [20] [27] [35] [44] [54] iterable = xrange(100) for i in range(10): ... print list(islice(iterable, i, i+1)) [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Besides being a bug factory, I think the getitem proposal would tend to steer people down the wrong road, away from more natural solutions to problems involving iterators. I don't think that (list(islice(iterable, i, i+1)) or [default])[0] is more natural than getitem(iterable, i, default) A basic step in learning the language is to differentiate between sequences and general iterators -- we should not conflate the two. Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
On 7/9/07, Raymond Hettinger [EMAIL PROTECTED] wrote: Also, as a practical matter, I think it is a bad idea to introduce __getitem__ style access to itertools because the starting point moves with each consecutive access: # access items 0, 2, 5, 9, 14, 20, ... for i in range(10): print getitem(iterable, i) Worse, this behavior changes depending on whether the iterable is re-iterable (a string would yield consecutive items while a generator would skip around as shown above). Besides being a bug factory, I think the getitem proposal would tend to steer people down the wrong road, away from more natural solutions to problems involving iterators. A basic step in learning the language is to differentiate between sequences and general iterators -- we should not conflate the two. But doesn't the very same argument also apply against islice(), which you just offered as an alternative? PS. If Walter is also at EuroPython, maybe you two could discuss this in person? -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
Guido van Rossum wrote: On 7/9/07, Raymond Hettinger [EMAIL PROTECTED] wrote: Also, as a practical matter, I think it is a bad idea to introduce __getitem__ style access to itertools because the starting point moves with each consecutive access: # access items 0, 2, 5, 9, 14, 20, ... for i in range(10): print getitem(iterable, i) Worse, this behavior changes depending on whether the iterable is re-iterable (a string would yield consecutive items while a generator would skip around as shown above). Besides being a bug factory, I think the getitem proposal would tend to steer people down the wrong road, away from more natural solutions to problems involving iterators. A basic step in learning the language is to differentiate between sequences and general iterators -- we should not conflate the two. But doesn't the very same argument also apply against islice(), which you just offered as an alternative? Exactly. PS. If Walter is also at EuroPython, maybe you two could discuss this in person? Sorry, I won't be at EuroPython. Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
From: Guido van Rossum [EMAIL PROTECTED] But doesn't the very same argument also apply against islice(), which you just offered as an alternative? Not really. The use cases for islice() typically do not involve repeated slices of an iterator unless it is slicing off the front few elements on each pass. In contrast, getitem() is all about grabbing something other than the frontmost element and seems to be intended for repeated calls on the same iterator. And its support for negative indices seems somewhat weird in the context of general purpose iterators: getitem(genprimes(), -1). I'll study Walter's use case but my instincts say that adding getitem() will do more harm than good. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
Raymond Hettinger wrote: From: Guido van Rossum [EMAIL PROTECTED] But doesn't the very same argument also apply against islice(), which you just offered as an alternative? Not really. The use cases for islice() typically do not involve repeated slices of an iterator unless it is slicing off the front few elements on each pass. In contrast, getitem() is all about grabbing something other than the frontmost element and seems to be intended for repeated calls on the same iterator. That wouldn't make sense as getitem() consumes the iterator! ;) But seriously: perhaps the name getitem() is misleading? What about item() or pickitem()? And its support for negative indices seems somewhat weird in the context of general purpose iterators: getitem(genprimes(), -1). This does indeed make as much sense as sum(itertools.count()). I'll study Walter's use case but my instincts say that adding getitem() will do more harm than good. Here's the function in use (somewhat invisibly, as it's used by the walknode() method). This gets the oldest news from Python's homepage: from ll.xist import parsers, xfind from ll.xist.ns import html e = parsers.parseURL(http://www.python.org;, tidy=True) print e.walknode(html.h2 xfind.hasclass(news))[-1] Google Adds Python Support to Google Calendar Developer's Guide Get the first comment line from a python file: getitem((line for line in open(Lib/codecs.py) if line.startswith(#)), 0) '### Registry and builtin stateless codec functions\n' Create a new unused identifier: def candidates(base): ... yield base ... for suffix in count(2): ... yield %s%d % (base, suffix) ... usedids = set((foo, bar)) getitem((i for i in candidates(foo) if i not in usedids), 0) 'foo2' Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] itertools addition: getitem()
I'd like to propose the following addition to itertools: A function itertools.getitem() which is basically equivalent to the following python code: _default = object() def getitem(iterable, index, default=_default): try: return list(iterable)[index] except IndexError: if default is _default: raise return default but without materializing the complete list. Negative indexes are supported too (this requires additional temporary storage for abs(index) objects). The patch is available at http://bugs.python.org/1749857 Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
How important is it to have the default in this API? __getitem__() doesn't have a default; instead, there's a separate API get() that provides a default (and I find defaulting to None more manageable than the _default = object() pattern). --Guido On 7/8/07, Walter Dörwald [EMAIL PROTECTED] wrote: I'd like to propose the following addition to itertools: A function itertools.getitem() which is basically equivalent to the following python code: _default = object() def getitem(iterable, index, default=_default): try: return list(iterable)[index] except IndexError: if default is _default: raise return default but without materializing the complete list. Negative indexes are supported too (this requires additional temporary storage for abs(index) objects). The patch is available at http://bugs.python.org/1749857 Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
Guido van Rossum schrieb: How important is it to have the default in this API? __getitem__() doesn't have a default; instead, there's a separate API get() that provides a default (and I find defaulting to None more manageable than the _default = object() pattern). getattr() has a default too, while __getattr__ hasn't... Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
On 7/8/07, Georg Brandl [EMAIL PROTECTED] wrote: Guido van Rossum schrieb: How important is it to have the default in this API? __getitem__() doesn't have a default; instead, there's a separate API get() that provides a default (and I find defaulting to None more manageable than the _default = object() pattern). getattr() has a default too, while __getattr__ hasn't... Fair enough. But I still want to hear of a practical use case for the default here. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
Guido van Rossum wrote: On 7/8/07, Georg Brandl [EMAIL PROTECTED] wrote: Guido van Rossum schrieb: How important is it to have the default in this API? __getitem__() doesn't have a default; instead, there's a separate API get() that provides a default (and I find defaulting to None more manageable than the _default = object() pattern). Of course it isn't implemented this way in the C version. getattr() has a default too, while __getattr__ hasn't... Fair enough. But I still want to hear of a practical use case for the default here. In most cases foo = getitem(iterable, 0, None) if foo is not None: ... is simpler than: try: foo = getitem(iterable, 0) except IndexError: pass else: ... Here is a use case from one of my import XML into the database scripts: compid = getitem(root[ns.Company_company_id], 0, None) if compid: compid = int(compid) The expression root[ns.company_id] returns an iterator that produces all children of the root node that are of the element type company_id. If there is a company_id its content will be turned into an int, if not None will be used. Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
On 7/8/07, Walter Dörwald [EMAIL PROTECTED] wrote: [quoting Guido] But I still want to hear of a practical use case for the default here. In most cases foo = getitem(iterable, 0, None) if foo is not None: ... is simpler than: try: foo = getitem(iterable, 0) except IndexError: pass else: ... Here is a use case from one of my import XML into the database scripts: compid = getitem(root[ns.Company_company_id], 0, None) if compid: compid = int(compid) The expression root[ns.company_id] returns an iterator that produces all children of the root node that are of the element type company_id. If there is a company_id its content will be turned into an int, if not None will be used. Ahem. I hope you have a better use case for getitem() than that (regardless of the default issue). I find it clearer to write that as try: compid = root[ns.company_id].next() except StopIteration: compid = None else: compid = int(compid) While this is more lines, it doesn't require one to know about getitem() on an iterator. This is the same reason why setdefault() was a mistake -- it's too obscure to invent a compact spelling for it since the compact spelling has to be learned or looked up. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
On 7/8/07, Guido van Rossum [EMAIL PROTECTED] wrote: Ahem. I hope you have a better use case for getitem() than that (regardless of the default issue). I find it clearer to write that as try: compid = root[ns.company_id].next() except StopIteration: compid = None else: compid = int(compid) While this is more lines, it doesn't require one to know about getitem() on an iterator. This is the same reason why setdefault() was a mistake -- it's too obscure to invent a compact spelling for it since the compact spelling has to be learned or looked up. Apropos of this discussion, I've occasionally wanted a faster version of the following: _nothing=object() def nth_next(seq,n,default=_nothing): ''' Return the n'th next element for seq, if it exists. If default is specified, it is return when the sequence is too short. Otherwise StopIteration is raised. ''' try: for i in xrange(n-1): seq.next() return seq.next() except StopIteration: if default is _nothing: raise return default The nice thing about this function is that it solves several problems in one: extraction of the n'th next element, testing for a minimum sequence length given a sentinel value, and just skipping n elements. It also leaves the sequence in a useful and predictable state, which is not true of the Python-version getitem code. While cute, I can't say if it is worthy of being an itertool function. Also vaguely apropos: def ilen(seq): 'Return the length of the hopefully finite sequence' n = 0 for x in seq: n += 1 return n Why? Because I find myself implementing it in virtually every project. Maybe I'm just an outlier, but many algorithms I implement need to consume iterators (for side-effects, obviously) and it is sometimes nice to know exactly how many elements were consumed. ~Kevin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
On 7/8/07, Kevin Jacobs [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Also vaguely apropos: def ilen(seq): 'Return the length of the hopefully finite sequence' n = 0 for x in seq: n += 1 return n Also known as:: sum(1 for _ in iterable) That's always been simple enough that I didn't feel a need for an ilen() function. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
Guido van Rossum wrote: On 7/8/07, Walter Dörwald [EMAIL PROTECTED] wrote: [quoting Guido] But I still want to hear of a practical use case for the default here. In most cases foo = getitem(iterable, 0, None) if foo is not None: ... is simpler than: try: foo = getitem(iterable, 0) except IndexError: pass else: ... Here is a use case from one of my import XML into the database scripts: compid = getitem(root[ns.Company_company_id], 0, None) if compid: compid = int(compid) The expression root[ns.company_id] returns an iterator that produces all children of the root node that are of the element type company_id. If there is a company_id its content will be turned into an int, if not None will be used. Ahem. I hope you have a better use case for getitem() than that (regardless of the default issue). I find it clearer to write that as try: compid = root[ns.company_id].next() except StopIteration: compid = None else: compid = int(compid) While this is more lines, it doesn't require one to know about getitem() on an iterator. This is the same reason why setdefault() was a mistake -- it's too obscure to invent a compact spelling for it since the compact spelling has to be learned or looked up. Well I have used (a Python version of) this getitem() function to implement a library that can match a CSS3 expression against an XML tree. For implementing the nth-child(), nth-last-child(), nth-of-type() and nth-last-of-type() pseudo classes (see http://www.w3.org/TR/css3-selectors/#structural-pseudos) getitem() was very useful. Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] itertools addition: getitem()
[Walter Dörwald] I'd like to propose the following addition to itertools: A function itertools.getitem() which is basically equivalent to the following python code: _default = object() def getitem(iterable, index, default=_default): try: return list(iterable)[index] except IndexError: if default is _default: raise return default but without materializing the complete list. Negative indexes are supported too (this requires additional temporary storage for abs(index) objects). Why not use the existing islice() function? x = list(islice(iterable, i, i+1)) or default Also, as a practical matter, I think it is a bad idea to introduce __getitem__ style access to itertools because the starting point moves with each consecutive access: # access items 0, 2, 5, 9, 14, 20, ... for i in range(10): print getitem(iterable, i) Worse, this behavior changes depending on whether the iterable is re-iterable (a string would yield consecutive items while a generator would skip around as shown above). Besides being a bug factory, I think the getitem proposal would tend to steer people down the wrong road, away from more natural solutions to problems involving iterators. A basic step in learning the language is to differentiate between sequences and general iterators -- we should not conflate the two. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com