Re: [Pytables-users] Weird behavior of row accessor

Francesc Altet Thu, 14 Dec 2006 08:16:05 -0800

El dj 14 de 12 del 2006 a les 07:37 -0800, en/na Curious Jan va
escriure:
> Well, that piece of information is certainly useful, because I didn't
> find it anywhere else.
> What is the default behavior of pytables 2.0 then going to be ?
> It kind of changes the behavior of my code whether I have to construct
> a string by adding paths and '/', or traverse a hierarchy
> recursively. 
> 
> Actually what confuses me is that the behvavior is different for
> for item in table:
>     do_something(item)
> than it is for
> for item in table[:]:
>     do_something(item)
> 
> The first allows me to use 
> item['path/to/child']
> while in the latter case I have to write
> item['path']['to']['child']
> 
> Is that behavior going to be unified ?


Mmm, that's a good question.

First of all, the:

for item in table:
    do_something(item)

approach is an iterator over the table *on-disk*, so it fetches one
record and offers it to the user wrapped in a tables.Row object for
manipulation (well, in fact, things are a bit more complicated because
the records are read in bunches, for efficency; but this is mostly
irrelevant for the end user).

However, in:

for item in table[:]:
    do_something(item)

you completely read the table *in-memory* and then proceed iterating
over each row. Of course, if table on-disk is large enough, this second
approach is overkill and the first one should normally be preferred.

So, this is the main reason why you are seeing different behaviours when
accessing the records of the table. In the first case, the Row accessor
does implement a __getitem__() which understands that a '/' works as a
separator for nested records. In the second case, the NumPy
__getitem__() does not understand such a notation.

This is probably a matter of tastes, but I like more the possibility of
specify nestedrecords 'à la PyTables' way (i.e.
'field/subfield/subsubfield') than the NumPy way (i.e.
['field']['subfield']['subsubfield']) mainly because of two reasons:

1. It's more compact and easier to type
2. It's faster to retrieve a nested field

So, I'd say that we should keep using the slash-separated way in 2.0.

Regarding implementing the NumPy way in the Row accessor (in order to
uniformize the access), we can try to implement it if there is interest
enough from the users, but I'd like not to have to, mainly because we
should have preferably one and only one way to do things (even though
those ways are different in different packages).

My current position in that regard is to maintain things like they are
now, and that in 2.0 the user should be aware that they are using a
Table iterator or a NumPy one, and act in consequence. Who knows,
perhaps NumPy developers can be convinced that the slash-separated way
to specify nested records is a good one and they might accept to
implement it.

Cheers,

-- 
Francesc Altet    |  Be careful about using the following code --
Carabos Coop. V.  |  I've only proven that it works, 
www.carabos.com   |  I haven't tested it. -- Donald Knuth


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Weird behavior of row accessor

Reply via email to