Re: [Pytables-users] Advice for new user

Francesc Alted Thu, 15 Mar 2012 11:51:55 -0700

On Mar 15, 2012, at 1:43 PM, Anthony Scopatz wrote:

> Hello Alvaro
> 
> On Thu, Mar 15, 2012 at 1:20 PM, Alvaro Tejero Cantero <alv...@minin.es> 
> wrote:
> Hi!
> 
> Thanks for the prompt answer. Actually I am not clear about switching
> from NxM array to N columns (64 in my case). How do I make a
> rectangular selection with columns? With an NxM array I just have to
> do arr[10000:20000,1:4] to select columns 1,2,3 and time samples 10000
> to 20000..
> 
> Tables are really a 1D array of C-structs. They are basically equivalent in 
> many 
> ways to numpy structured arrays: 
> http://docs.scipy.org/doc/numpy/user/basics.rec.html
> So there is no analogy to the 2D slice that you mention above.
>  
> While it is easy to manipulate integer indices, from what
> I've read columns would have to have string identifiers so I would be
> doing a lot of int<>str conversion?
> 
> No, you don't do a lot of str - int conversions.  The strs represent field 
> names
> and only incidentally indexes.   
> 
> My recorded data is so homogeneous (a huge 70Gb matrix of integers)
> that I am a bit lost among all the mixed-typing that seems to be the
> primary use-case behind columns. If I were to stay with the NxM array,
> would reads still be chunked?
> 
> You would need to use the CArray (chunked array) or the EArray (extensible 
> array) for the underlying array on disk to be chunked.  Reading can always
> be chucked by accessing a slice.  This is true for all arrays and tables.
>  
> On the other hand, if I want to use arr[:,3] as a mask for another
> part of the array, is it more reasonable to have that be col3, in
> terms of pytables?
> 
> Reasonable is probably the wrong word here.  It is more that tables 
> do it one way and arrays do it in another.  If you are doing a lot of 
> single column at a time access, then you should think about using
> Tables for this.
>  
> I also get shivers when I see the looping constructs in the tutorial,
> mainly because I have learn to do only vectorized operations in numpy
> and never ever to write a python loop / list comprehension.
> 
> Ahh, so you have to understand which operations you do happen on 
> the file and when data is already in memory.  With numpy you don't 
> want to use Python loops because everything is already in memory.
> However with Pytables most of what you are doing is pulling data from
> disk into memory.  So the Python loop overhead is small relative to the
> communication time of ram <-> disk.
> 
> Most of the loops in pytables are actually evaluated using numexpr
> iterators.  Numexpr is a highly optimized way of collapsing numerical
> expressions.  In short, your probably don't need to worry too much 
> about Python loops (when you are new to the library) when operating
> on PyTables objects.  You do need to worry about such loops on the
> numpy arrays that the PyTables objects return.


Anthony is very right here.  If you have very large amounts of data, you 
absolutely need to get used to the iterator concept, as this allows you to run 
into all your dataset without a need to load it in-memory.  Iterators in 
PyTables are one of its most powerful and effective constructions, so be sure 
that you master them if you want to get the most out of PyTables.

-- Francesc Alted







------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Advice for new user

Reply via email to