A Thursday 14 May 2009 12:26:33 David Fokkema escrigué:
> Hi list,
>
> Why is this different?
>
> [x for x in table]
> or
> [x for x in table.iterrows()]
>
> (which returns the first row over and over)
>
> and
>
> for x in table:
>     x
>
> or even
>
> [x[0] for x in table]
>
> (which returns all different rows)
>
> Probably there's some __iter__ or other magic going on, but this is not
> intuitive for me. Is this a (feature request) bug or am I simply missing
> something?

No, although it might seem so, there is no magic there.  It is more that you 
should regard the Row class more as a data accessor than a data container.  As 
you already know, Row is meant to be used inside iterators, providing access 
to the data in the current row of the iterator (BTW, the current row is 
accessible via the Row.nrow property).  So, if what you want is the actual 
data you need to explicitely specify a getter in Row.

Working with an example will clearify things.  Let's consider the next table:

In [24]: t
Out[24]:
/t (Table(10,)) ''
  description := {
  "f0": Int64Col(shape=(), dflt=0, pos=0),
  "f1": Float64Col(shape=(), dflt=0.0, pos=1)}
  byteorder := 'little'
  chunkshape := (512,)

Using the iterator without a getter on a classic loop gives:

In [26]: for r in t: r
   ....:
Out[26]: (0, 0.0)
Out[26]: (1, 1.0)
Out[26]: (2, 2.0)
Out[26]: (3, 3.0)
Out[26]: (4, 4.0)
Out[26]: (5, 5.0)
Out[26]: (6, 6.0)
Out[26]: (7, 7.0)
Out[26]: (8, 8.0)
Out[26]: (9, 9.0)

However, using the same iterator on a comprehension list gives:

In [25]: [r for r in t]
Out[25]:
[(9, 9.0),
 (9, 9.0),
 (9, 9.0),
 (9, 9.0),
 (9, 9.0),
 (9, 9.0),
 (9, 9.0),
 (9, 9.0),
 (9, 9.0),
 (9, 9.0)]

Why the difference?  Well, the former is returning a series of Row objects 
that the IPython shell is converting into a representation *immediately* for 
each iteration, while the later is returning a list of references to the 
*same* Row object, that are not converted into its string representation until 
the entire list has been built.  However, by when the list is represented, the 
iterator has already finished and hence all the references to the Row object 
fetch the data pointed by its internal row counter, which is 9 by that time.

This is better seen introspecting the values in the list:

In [35]: l = [r for r in t]

In [36]: type(l[0])
Out[36]: <type 'tables.tableExtension.Row'>

In [37]: l[0].nrow
Out[37]: 9

In [38]: l[1].nrow
Out[38]: 9

In [39]: l[9].nrow
Out[39]: 9

As you see, all the internal counters point to the same row (the last one) 
because all the items in the list are a reference to the *same* Row object.  
As I said before, you can solve this by thinking of Row as a data accessor, 
that is, calling a getter.  For example:

In [40]: [r[:] for r in t]
Out[40]:                  
[(0, 0.0),                
 (1, 1.0),
 (2, 2.0),
 (3, 3.0),
 (4, 4.0),
 (5, 5.0),
 (6, 6.0),
 (7, 7.0),
 (8, 8.0),
 (9, 9.0)]

or:

In [41]: [r.fetch_all_fields() for r in t]
Out[41]:
[(0, 0.0),
 (1, 1.0),
 (2, 2.0),
 (3, 3.0),
 (4, 4.0),
 (5, 5.0),
 (6, 6.0),
 (7, 7.0),
 (8, 8.0),
 (9, 9.0)]

[see the manual for the difference between the '[:]' and '.fetch_all_fields()' 
idioms]

Hope that helps,

-- 
Francesc Alted

"One would expect people to feel threatened by the 'giant
brains or machines that think'.  In fact, the frightening
computer becomes less frightening if it is used only to
simulate a familiar noncomputer."

-- Edsger W. Dykstra


------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to