Re: [Pytables-users] Optimizing pytables for reading entire columns at a time

2012-09-24 Thread Ümit Seren
With CArrays you can only have one specific type for the array (int, float, etc) whereas with a table each column can have a different type (string, float, etc). If you want to replicate this with carray, you would have to have multiple carray's for each type. I think for storing numerical data

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

2012-07-18 Thread Ümit Seren
Just to add what Anthony said: In the end it also depends how unrelated your data is and how you want to access it. If the access scenaria is that you usually only search or select within a specific dataset then splitting up the datasets and putting them into separate tables is the way to go. In

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

2012-07-18 Thread Ümit Seren
get are occasional read errors (which isn't much of a problem for me), so I am thinking. Could there be a way to reduce the metadata within an hdf5 and at the same time, use a multi-tabled approach to solve my problem? Thanks, Jacob On Wed, Jul 18, 2012 at 1:22 AM, Ümit Seren uemit.se

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

2012-07-18 Thread Ümit Seren
- CArray - dataset2 . . . - dataset30.000 If you could help me out with these two items, I think I will have enough knowledge under my belt to know what I need to do. Thanks again! ;) On Wed, Jul 18, 2012 at 6:21 AM, Ümit Seren uemit.se

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

2012-07-18 Thread Ümit Seren
datasets directly linked to a similar node (in this case, data)? I seem to have a problem putting that many nodes from one root. -Jacob On Wed, Jul 18, 2012 at 6:54 AM, Ümit Seren uemit.se...@gmail.com wrote: On Wed, Jul 18, 2012 at 1:32 PM, Jacob Bennett jacob.bennet...@gmail.com wrote: Sounds

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

2012-07-18 Thread Ümit Seren
...@pytables.org wrote: On 7/18/12 2:07 PM, Ümit Seren wrote: I actually had 30.000 groups attached to the data group. But I guess it doesn't really matter whether it is a table or a group. They both are nodes. 30.000 datasets attached to the same group? I'm interested in knowing if you detected

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

2012-07-18 Thread Ümit Seren
fal...@pytables.org: On 7/18/12 4:11 PM, Ümit Seren wrote: Actually I had 30.000 groups in a parent group. Each of the 30.000 groups had maybe 3 datasets. So to be honest I never had 30.000 datasets in a single group. I guess you will probably have to disable the LRU cache in that case

Re: [Pytables-users] Main differences between PyTables and Relational

2012-04-26 Thread Ümit Seren
Good points. Just some additional comments: I do think that scientific/hierarchical file formats like HDF5 and RDBMS system have their specific use cases and I don't think it makes sense to replace one with the other. I do also think that you shouldn't try to apply RDBMS principles to HDF5 like

Re: [Pytables-users] Help on sorting tables

2012-03-22 Thread Ümit Seren
AFAIK there is no sort functionality built into PyTables. I think there are 4 ways to do it: 1.) load all 7.5 million records and sort it in memory (if it fits into the memory) 2.) implement your own external sorting algorithm (http://en.wikipedia.org/wiki/External_sorting) using pytables

Re: [Pytables-users] Question about reading a complete table.

2012-02-20 Thread Ümit Seren
I guess using the slice operator on the table should probably also load the entire table into memory: a = f.root.path.to.table[:] This will return a structured array tough. On Mon, Feb 20, 2012 at 5:43 PM, Anthony Scopatz scop...@gmail.com wrote: Hello German, The easiest and probably the

Re: [Pytables-users] Performance problems in indexed tables.

2012-01-23 Thread Ümit Seren
Because the profile output is probably not formatted properly in the mail I attached the two line_profiler profile output files. In addition to this I also added the profile for _table__whereIndexed() function. On Mon, Jan 23, 2012 at 12:13 PM, Ümit Seren uemit.se...@gmail.com wrote: Hi Anthony

[Pytables-users] Performance problems in indexed tables.

2012-01-21 Thread Ümit Seren
I recently used ptrepack to compact my hdf5 file and forgot to active the options to propagate indexes. Just out of curiosity I decided to compare performance between the two tables (one with index and one without) for some queries. The table structure looks like this: gene_mid_pos:

Re: [Pytables-users] Write performance iterating through nodes

2012-01-20 Thread Ümit Seren
a while to 1 table/sec instead of 10 tables/sec When i change it to NODE_CACHE_SLOTS=0 I don't have any performance problems. On Thu, Jan 19, 2012 at 7:43 AM, Francesc Alted fal...@pytables.org wrote: 2012/1/18 Ümit Seren uemit.se...@gmail.com Hi Francesc, I will try to get some numbers

Re: [Pytables-users] Write performance iterating through nodes

2012-01-17 Thread Ümit Seren
degrade at all. Memory consumption is also reasonable. cheers Ümit P.S.: Sorry for writing this mail in this way. However I somehow didn't get your response directly via mail On Mon, Jan 16, 2012 at 7:43 PM, Ümit Seren uemit.se...@gmail.com wrote: I created a hdf5 file with pytables which

[Pytables-users] Write performance iterating through nodes

2012-01-16 Thread Ümit Seren
I created a hdf5 file with pytables which contains around 29 000 tables with around 31k rows each. I am trying to create a caching table in the same hdf5 file which contains a subset of those 29 000 tables. I wrote a script which basically iterates through each of the 29 000 tables retrieves a