Re: [Python-Dev] itertools addition: getitem()

2007-07-11 Thread Walter Dörwald
Giovanni Bajo wrote:

 On 09/07/2007 21.23, Walter Dörwald wrote:
 
   from ll.xist import parsers, xfind
   from ll.xist.ns import html
   e = parsers.parseURL(http://www.python.org;, tidy=True)
   print e.walknode(html.h2  xfind.hasclass(news))[-1]
 Google Adds Python Support to Google Calendar Developer's Guide


 Get the first comment line from a python file:

   getitem((line for line in open(Lib/codecs.py) if 
 line.startswith(#)), 0)
 '### Registry and builtin stateless codec functions\n'


 Create a new unused identifier:

   def candidates(base):
 ... yield base
 ... for suffix in count(2):
 ... yield %s%d % (base, suffix)
 ...
   usedids = set((foo, bar))
   getitem((i for i in candidates(foo) if i not in usedids), 0)
 'foo2'
 
 You keep posting examples where you call your getitem() function with 0 as 
 index, or -1.
 
 getitem(it, 0) already exists and it's spelled it.next(). getitem(it, -1) 
 might be useful in fact, and it might be spelled last(it) (or it.last()). 
 Then 
 one may want to add first() for simmetry, but that's it:
 
 first(i for i in candidates(foo) if i not in usedids)
 last(line for line in open(Lib/codecs.py) if line[0] == '#')
 
 Are there real-world use cases for getitem(it, n) with n not in (0, -1)? I 
 share Raymond's feelings on this. And by the way, if you wonder, I have these 
 exact feelings as well for islice... :)

It useful for screen scraping HTML. Suppose you have the following HTML 
table:

table
trtd01.01.2007/tdtd12.34/tdtdFoo/td/tr
trtd13.01.2007/tdtd23.45/tdtdBar/td/tr
trtd04.02.2007/tdtd45.56/tdtdBaz/td/tr
trtd27.02.2007/tdtd56.78/tdtdSpam/td/tr
trtd17.03.2007/tdtd67.89/tdtdEggs/td/tr
trtd  /tdtd164.51/tdtdTotal/td/tr
trtd  /tdtd(incl. VAT)/tdtd/td/tr
/table

To extract the total sum, you want the second column from the second to 
last row, i.e. something like:
row = getitem((r for r in table if r.name == tr), -2)
col = getitem((c for c in row if c.name == td), 1)

Servus,
Walter
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-10 Thread Giovanni Bajo
On 09/07/2007 21.23, Walter Dörwald wrote:

   from ll.xist import parsers, xfind
   from ll.xist.ns import html
   e = parsers.parseURL(http://www.python.org;, tidy=True)
   print e.walknode(html.h2  xfind.hasclass(news))[-1]
 Google Adds Python Support to Google Calendar Developer's Guide
 
 
 Get the first comment line from a python file:
 
   getitem((line for line in open(Lib/codecs.py) if 
 line.startswith(#)), 0)
 '### Registry and builtin stateless codec functions\n'
 
 
 Create a new unused identifier:
 
   def candidates(base):
 ... yield base
 ... for suffix in count(2):
 ... yield %s%d % (base, suffix)
 ...
   usedids = set((foo, bar))
   getitem((i for i in candidates(foo) if i not in usedids), 0)
 'foo2'

You keep posting examples where you call your getitem() function with 0 as 
index, or -1.

getitem(it, 0) already exists and it's spelled it.next(). getitem(it, -1) 
might be useful in fact, and it might be spelled last(it) (or it.last()). Then 
one may want to add first() for simmetry, but that's it:

first(i for i in candidates(foo) if i not in usedids)
last(line for line in open(Lib/codecs.py) if line[0] == '#')

Are there real-world use cases for getitem(it, n) with n not in (0, -1)? I 
share Raymond's feelings on this. And by the way, if you wonder, I have these 
exact feelings as well for islice... :)
-- 
Giovanni Bajo


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-09 Thread Walter Dörwald
Raymond Hettinger wrote:
 [Walter Dörwald]
 I'd like to propose the following addition to itertools: A function
 itertools.getitem() which is basically equivalent to the following
 python code:

 _default = object()

 def getitem(iterable, index, default=_default):
try:
   return list(iterable)[index]
except IndexError:
   if default is _default:
  raise
   return default

 but without materializing the complete list. Negative indexes are
 supported too (this requires additional temporary storage for abs(index)
 objects).
 
 Why not use the existing islice() function?
 
   x = list(islice(iterable, i, i+1)) or default

This doesn't work, because it produces a list

 list(islice(xrange(10), 2, 3)) or 42
[2]

The following would work:
   x = (list(islice(iterable, i, i+1)) or [default])[0]

However islice() doesn't support negative indexes, getitem() does.

 Also, as a practical matter, I think it is a bad idea to introduce
 __getitem__ style access to itertools because the starting point
 moves with each consecutive access:
 
# access items 0, 2, 5, 9, 14, 20, ...
for i in range(10):
print getitem(iterable, i)
 
 Worse, this behavior changes depending on whether the iterable
 is re-iterable (a string would yield consecutive items while a
 generator would skip around as shown above).

islice() has the same problem:

 from itertools import *
 iterable = iter(xrange(100))
 for i in range(10):
... print list(islice(iterable, i, i+1))
[0]
[2]
[5]
[9]
[14]
[20]
[27]
[35]
[44]
[54]

 iterable = xrange(100)
 for i in range(10):
... print list(islice(iterable, i, i+1))
[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]

 Besides being a bug factory, I think the getitem proposal would
 tend to steer people down the wrong road, away from more
 natural solutions to problems involving iterators.

I don't think that
   (list(islice(iterable, i, i+1)) or [default])[0]
is more natural than
   getitem(iterable, i, default)

 A basic step
 in learning the language is to differentiate between sequences
 and general iterators -- we should not conflate the two.

Servus,
   Walter
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-09 Thread Guido van Rossum
On 7/9/07, Raymond Hettinger [EMAIL PROTECTED] wrote:
 Also, as a practical matter, I think it is a bad idea to introduce
 __getitem__ style access to itertools because the starting point
 moves with each consecutive access:

 # access items 0, 2, 5, 9, 14, 20, ...
 for i in range(10):
 print getitem(iterable, i)

 Worse, this behavior changes depending on whether the iterable
 is re-iterable (a string would yield consecutive items while a
 generator would skip around as shown above).

 Besides being a bug factory, I think the getitem proposal would
 tend to steer people down the wrong road, away from more
 natural solutions to problems involving iterators.  A basic step
 in learning the language is to differentiate between sequences
 and general iterators -- we should not conflate the two.

But doesn't the very same argument also apply against islice(), which
you just offered as an alternative?

PS. If Walter is also at EuroPython, maybe you two could discuss this in person?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-09 Thread Walter Dörwald
Guido van Rossum wrote:
 On 7/9/07, Raymond Hettinger [EMAIL PROTECTED] wrote:
 Also, as a practical matter, I think it is a bad idea to introduce
 __getitem__ style access to itertools because the starting point
 moves with each consecutive access:

 # access items 0, 2, 5, 9, 14, 20, ...
 for i in range(10):
 print getitem(iterable, i)

 Worse, this behavior changes depending on whether the iterable
 is re-iterable (a string would yield consecutive items while a
 generator would skip around as shown above).

 Besides being a bug factory, I think the getitem proposal would
 tend to steer people down the wrong road, away from more
 natural solutions to problems involving iterators.  A basic step
 in learning the language is to differentiate between sequences
 and general iterators -- we should not conflate the two.
 
 But doesn't the very same argument also apply against islice(), which
 you just offered as an alternative?

Exactly.

 PS. If Walter is also at EuroPython, maybe you two could discuss this in
 person?

Sorry, I won't be at EuroPython.

Servus,
   Walter
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-09 Thread Raymond Hettinger
From: Guido van Rossum [EMAIL PROTECTED]
 But doesn't the very same argument also apply against islice(), which
 you just offered as an alternative?

Not really.  The use cases for islice() typically do not involve
repeated slices of an iterator unless it is slicing off the front
few elements on each pass.  In contrast, getitem() is all about
grabbing something other than the frontmost element and seems
to be intended for repeated calls on the same iterator.  And its
support for negative indices seems somewhat weird in the
context of general purpose iterators:  getitem(genprimes(), -1).

I'll study Walter's use case but my instincts say that adding
getitem() will do more harm than good.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-09 Thread Walter Dörwald
Raymond Hettinger wrote:

 From: Guido van Rossum [EMAIL PROTECTED]
 But doesn't the very same argument also apply against islice(), which
 you just offered as an alternative?
 
 Not really.  The use cases for islice() typically do not involve
 repeated slices of an iterator unless it is slicing off the front
 few elements on each pass.  In contrast, getitem() is all about
 grabbing something other than the frontmost element and seems
 to be intended for repeated calls on the same iterator. 

That wouldn't make sense as getitem() consumes the iterator! ;)

But seriously: perhaps the name getitem() is misleading? What about 
item() or pickitem()?

 And its
 support for negative indices seems somewhat weird in the
 context of general purpose iterators:  getitem(genprimes(), -1).

This does indeed make as much sense as sum(itertools.count()).

 I'll study Walter's use case but my instincts say that adding
 getitem() will do more harm than good.

Here's the function in use (somewhat invisibly, as it's used by the 
walknode() method). This gets the oldest news from Python's homepage:

  from ll.xist import parsers, xfind
  from ll.xist.ns import html
  e = parsers.parseURL(http://www.python.org;, tidy=True)
  print e.walknode(html.h2  xfind.hasclass(news))[-1]
Google Adds Python Support to Google Calendar Developer's Guide


Get the first comment line from a python file:

  getitem((line for line in open(Lib/codecs.py) if 
line.startswith(#)), 0)
'### Registry and builtin stateless codec functions\n'


Create a new unused identifier:

  def candidates(base):
... yield base
... for suffix in count(2):
... yield %s%d % (base, suffix)
...
  usedids = set((foo, bar))
  getitem((i for i in candidates(foo) if i not in usedids), 0)
'foo2'

Servus,
Walter

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] itertools addition: getitem()

2007-07-08 Thread Walter Dörwald
I'd like to propose the following addition to itertools: A function 
itertools.getitem() which is basically equivalent to the following 
python code:

_default = object()

def getitem(iterable, index, default=_default):
try:
   return list(iterable)[index]
except IndexError:
   if default is _default:
  raise
   return default

but without materializing the complete list. Negative indexes are 
supported too (this requires additional temporary storage for abs(index) 
objects).

The patch is available at http://bugs.python.org/1749857

Servus,
Walter
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-08 Thread Guido van Rossum
How important is it to have the default in this API? __getitem__()
doesn't have a default; instead, there's a separate API get() that
provides a default (and I find defaulting to None more manageable than
the _default = object() pattern).

--Guido

On 7/8/07, Walter Dörwald [EMAIL PROTECTED] wrote:
 I'd like to propose the following addition to itertools: A function
 itertools.getitem() which is basically equivalent to the following
 python code:

 _default = object()

 def getitem(iterable, index, default=_default):
 try:
return list(iterable)[index]
 except IndexError:
if default is _default:
   raise
return default

 but without materializing the complete list. Negative indexes are
 supported too (this requires additional temporary storage for abs(index)
 objects).

 The patch is available at http://bugs.python.org/1749857

 Servus,
 Walter
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-08 Thread Georg Brandl
Guido van Rossum schrieb:
 How important is it to have the default in this API? __getitem__()
 doesn't have a default; instead, there's a separate API get() that
 provides a default (and I find defaulting to None more manageable than
 the _default = object() pattern).

getattr() has a default too, while __getattr__ hasn't...

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-08 Thread Guido van Rossum
On 7/8/07, Georg Brandl [EMAIL PROTECTED] wrote:
 Guido van Rossum schrieb:
  How important is it to have the default in this API? __getitem__()
  doesn't have a default; instead, there's a separate API get() that
  provides a default (and I find defaulting to None more manageable than
  the _default = object() pattern).

 getattr() has a default too, while __getattr__ hasn't...

Fair enough.

But I still want to hear of a practical use case for the default here.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-08 Thread Walter Dörwald
Guido van Rossum wrote:
 On 7/8/07, Georg Brandl [EMAIL PROTECTED] wrote:
 Guido van Rossum schrieb:
 How important is it to have the default in this API? __getitem__()
 doesn't have a default; instead, there's a separate API get() that
 provides a default (and I find defaulting to None more manageable than
 the _default = object() pattern).

Of course it isn't implemented this way in the C version.

 getattr() has a default too, while __getattr__ hasn't...
 
 Fair enough.
 
 But I still want to hear of a practical use case for the default here.

In most cases

foo = getitem(iterable, 0, None)
if foo is not None:
   ...

is simpler than:

try:
   foo = getitem(iterable, 0)
except IndexError:
   pass
else:
   ...

Here is a use case from one of my import XML into the database scripts:

compid = getitem(root[ns.Company_company_id], 0, None)
if compid:
   compid = int(compid)

The expression root[ns.company_id] returns an iterator that produces all 
children of the root node that are of the element type company_id. If 
there is a company_id its content will be turned into an int, if not 
None will be used.

Servus,
Walter

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-08 Thread Guido van Rossum
On 7/8/07, Walter Dörwald [EMAIL PROTECTED] wrote:
[quoting Guido]
  But I still want to hear of a practical use case for the default here.

 In most cases

 foo = getitem(iterable, 0, None)
 if foo is not None:
...

 is simpler than:

 try:
foo = getitem(iterable, 0)
 except IndexError:
pass
 else:
...

 Here is a use case from one of my import XML into the database scripts:

 compid = getitem(root[ns.Company_company_id], 0, None)
 if compid:
compid = int(compid)

 The expression root[ns.company_id] returns an iterator that produces all
 children of the root node that are of the element type company_id. If
 there is a company_id its content will be turned into an int, if not
 None will be used.

Ahem. I hope you have a better use case for getitem() than that
(regardless of the default issue). I find it clearer to write that as

try:
  compid = root[ns.company_id].next()
except StopIteration:
  compid = None
else:
  compid = int(compid)

While this is more lines, it doesn't require one to know about
getitem() on an iterator. This is the same reason why setdefault() was
a mistake -- it's too obscure to invent a compact spelling for it
since the compact spelling has to be learned or looked up.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-08 Thread Kevin Jacobs [EMAIL PROTECTED]

On 7/8/07, Guido van Rossum [EMAIL PROTECTED] wrote:


Ahem. I hope you have a better use case for getitem() than that
(regardless of the default issue). I find it clearer to write that as

try:
  compid = root[ns.company_id].next()
except StopIteration:
  compid = None
else:
  compid = int(compid)

While this is more lines, it doesn't require one to know about
getitem() on an iterator. This is the same reason why setdefault() was
a mistake -- it's too obscure to invent a compact spelling for it
since the compact spelling has to be learned or looked up.




Apropos of this discussion, I've occasionally wanted a faster version of the
following:

_nothing=object()

def nth_next(seq,n,default=_nothing):
 '''
 Return the n'th next element for seq, if it exists.

 If default is specified, it is return when the sequence is too short.
 Otherwise StopIteration is raised.
 '''
 try:
   for i in xrange(n-1):
 seq.next()
   return seq.next()
 except StopIteration:
   if default is _nothing:
 raise
   return default

The nice thing about this function is that it solves several problems in
one: extraction of the n'th next element, testing for a minimum sequence
length given a sentinel value, and just skipping n elements.  It also leaves
the sequence in a useful and predictable state, which is not true of the
Python-version getitem code.  While cute, I can't say if it is worthy of
being an itertool function.

Also vaguely apropos:

def ilen(seq):
 'Return the length of the hopefully finite sequence'
 n = 0
 for x in seq:
   n += 1
 return n

Why?  Because I find myself implementing it in virtually every project.
Maybe I'm just an outlier, but many algorithms I implement need to consume
iterators (for side-effects, obviously) and it is sometimes nice to know
exactly how many elements were consumed.

~Kevin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-08 Thread Steven Bethard
On 7/8/07, Kevin Jacobs [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Also vaguely apropos:

 def ilen(seq):
   'Return the length of the hopefully finite sequence'
   n = 0
   for x in seq:
  n += 1
   return n

Also known as::

sum(1 for _ in iterable)

That's always been simple enough that I didn't feel a need for an
ilen() function.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
--- Bucky Katt, Get Fuzzy
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-08 Thread Walter Dörwald
Guido van Rossum wrote:

 On 7/8/07, Walter Dörwald [EMAIL PROTECTED] wrote:
 [quoting Guido]
  But I still want to hear of a practical use case for the default here.

 In most cases

 foo = getitem(iterable, 0, None)
 if foo is not None:
...

 is simpler than:

 try:
foo = getitem(iterable, 0)
 except IndexError:
pass
 else:
...

 Here is a use case from one of my import XML into the database scripts:

 compid = getitem(root[ns.Company_company_id], 0, None)
 if compid:
compid = int(compid)

 The expression root[ns.company_id] returns an iterator that produces all
 children of the root node that are of the element type company_id. If
 there is a company_id its content will be turned into an int, if not
 None will be used.
 
 Ahem. I hope you have a better use case for getitem() than that
 (regardless of the default issue). I find it clearer to write that as
 
 try:
  compid = root[ns.company_id].next()
 except StopIteration:
  compid = None
 else:
  compid = int(compid)
 
 While this is more lines, it doesn't require one to know about
 getitem() on an iterator. This is the same reason why setdefault() was
 a mistake -- it's too obscure to invent a compact spelling for it
 since the compact spelling has to be learned or looked up.

Well I have used (a Python version of) this getitem() function to 
implement a library that can match a CSS3 expression against an XML 
tree. For implementing the nth-child(), nth-last-child(), nth-of-type() 
and nth-last-of-type() pseudo classes (see 
http://www.w3.org/TR/css3-selectors/#structural-pseudos) getitem() was 
very useful.

Servus,
Walter
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] itertools addition: getitem()

2007-07-08 Thread Raymond Hettinger
[Walter Dörwald]
 I'd like to propose the following addition to itertools: A function
 itertools.getitem() which is basically equivalent to the following
 python code:

 _default = object()

 def getitem(iterable, index, default=_default):
try:
   return list(iterable)[index]
except IndexError:
   if default is _default:
  raise
   return default

 but without materializing the complete list. Negative indexes are
 supported too (this requires additional temporary storage for abs(index)
 objects).

Why not use the existing islice() function?

   x = list(islice(iterable, i, i+1)) or default

Also, as a practical matter, I think it is a bad idea to introduce
__getitem__ style access to itertools because the starting point
moves with each consecutive access:

# access items 0, 2, 5, 9, 14, 20, ...
for i in range(10):
print getitem(iterable, i)

Worse, this behavior changes depending on whether the iterable
is re-iterable (a string would yield consecutive items while a
generator would skip around as shown above).

Besides being a bug factory, I think the getitem proposal would
tend to steer people down the wrong road, away from more
natural solutions to problems involving iterators.  A basic step
in learning the language is to differentiate between sequences
and general iterators -- we should not conflate the two.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com