On 9/27/12 8:10 PM, Anthony Scopatz wrote:
>
> I think I remember seeing there was a performance limit with tables > 
> 255 columns.  I can't find a reference to that so it's possible I made 
> it up.  However, I was wondering if carrays had some limitation like 
> that.
>
> Tables are a different data set.  The issue with tables is that column 
> metadata (names, etc.) needs to fit in the attribute space.  The size 
> of this space is statically limited to 64 kb.  In my experience, this 
> number is in the thousands of columns (not hundreds).

For the record, the PerformanceWarning issued by PyTables has nothing to 
do with the attribute space, but rather to the fact that putting too 
many columns in the same table means that you have to retrieve much more 
data even if you are retrieving only one single column.  Also, internal 
I/O buffers have to be much more larger, and compressors tend to work 
much less efficiently too.

> On the other hand CArrays don't have much of any column metadata. 
>  CArrays should scale to an infinite number of columns without any issue.

Yeah, they should scale better, although saying they can reach infinite 
scalability is a bit audacious :)  All the CArrays are datasets that 
have to be saved internally by HDF5, and that requires quite a few of 
resources to have track of them.

-- 
Francesc Alted


------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to