Anthony Scopatz <scop...@gmail.com> schreef:

> Are you using compression on this EArray?  This method is basically a thin
> wrapper over some HDF5 functions. I think that the data that you are asking
> for (inadvertently, maybe) is just expensive to get.

No, no compression. But I saw this is one of the first pytables data  
sets I created years ago. The chunk size was not chosen well. I  
improved that now (better chunk size/shape, transposed axes, and using  
CArray) and things are roughly 50% faster.

But I still don't understand why so much data is apparently being read  
when I only want to know which children (i.e. the leaf names) a group  
contains. To do this in my program I loop over _v_children.items(),  
i.e., like,

d = {}
for label, node in f.root.recordings.AB_5000._v_children.items():
    d[label] = node

I would have expected code like this to yield a dictionary with node  
objects, without reading/inspecting the data content that nodes  
contain. But apparently under the hood HDF5 is looking at the contents  
of the nodes, which takes a while if they are large, especially over a  
usb3 connection. It is not reading the full array into RAM, because  
the memory footprint of the python session doesn't increase  
appreciably if I run the code above.

Thanks, all the best, Gabriel


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to