With CArrays you can only have one specific type for the array (int,
float, etc) whereas with a table each column can have a different type
(string, float, etc). If you want to replicate this with carray, you
would have to have multiple carray's for each type.
I think for storing numerical data
Just to add what Anthony said:
In the end it also depends how unrelated your data is and how you want
to access it. If the access scenaria is that you usually only search
or select within a specific dataset then splitting up the datasets and
putting them into separate tables is the way to go. In
get are occasional read errors (which isn't much of a
problem for me), so I am thinking. Could there be a way to reduce the
metadata within an hdf5 and at the same time, use a multi-tabled approach to
solve my problem?
Thanks,
Jacob
On Wed, Jul 18, 2012 at 1:22 AM, Ümit Seren uemit.se
- CArray
- dataset2
.
.
.
- dataset30.000
If you could help me out with these two items, I think I will have enough
knowledge under my belt to know what I need to do. Thanks again! ;)
On Wed, Jul 18, 2012 at 6:21 AM, Ümit Seren uemit.se
datasets directly
linked to a similar node (in this case, data)? I seem to have a problem
putting that many nodes from one root.
-Jacob
On Wed, Jul 18, 2012 at 6:54 AM, Ümit Seren uemit.se...@gmail.com wrote:
On Wed, Jul 18, 2012 at 1:32 PM, Jacob Bennett
jacob.bennet...@gmail.com wrote:
Sounds
...@pytables.org wrote:
On 7/18/12 2:07 PM, Ümit Seren wrote:
I actually had 30.000 groups attached to the data group. But I guess
it doesn't really matter whether it is a table or a group. They both
are nodes.
30.000 datasets attached to the same group? I'm interested in knowing
if you detected
fal...@pytables.org:
On 7/18/12 4:11 PM, Ümit Seren wrote:
Actually I had 30.000 groups in a parent group.
Each of the 30.000 groups had maybe 3 datasets.
So to be honest I never had 30.000 datasets in a single group.
I guess you will probably have to disable the LRU cache in that case
Good points.
Just some additional comments:
I do think that scientific/hierarchical file formats like HDF5 and
RDBMS system have their specific use cases and I don't think it makes
sense to replace one with the other.
I do also think that you shouldn't try to apply RDBMS principles to
HDF5 like
AFAIK there is no sort functionality built into PyTables.
I think there are 4 ways to do it:
1.) load all 7.5 million records and sort it in memory (if it fits
into the memory)
2.) implement your own external sorting algorithm
(http://en.wikipedia.org/wiki/External_sorting) using pytables
I guess using the slice operator on the table should probably also
load the entire table into memory:
a = f.root.path.to.table[:]
This will return a structured array tough.
On Mon, Feb 20, 2012 at 5:43 PM, Anthony Scopatz scop...@gmail.com wrote:
Hello German,
The easiest and probably the
Because the profile output is probably not formatted properly in the
mail I attached the two line_profiler profile output files. In
addition to this I also added the profile for _table__whereIndexed()
function.
On Mon, Jan 23, 2012 at 12:13 PM, Ümit Seren uemit.se...@gmail.com wrote:
Hi Anthony
I recently used ptrepack to compact my hdf5 file and forgot to active
the options to propagate indexes.
Just out of curiosity I decided to compare performance between the two
tables (one with index and one without) for some queries.
The table structure looks like this:
gene_mid_pos:
a while to 1 table/sec instead of 10 tables/sec
When i change it to NODE_CACHE_SLOTS=0 I don't have any performance problems.
On Thu, Jan 19, 2012 at 7:43 AM, Francesc Alted fal...@pytables.org wrote:
2012/1/18 Ümit Seren uemit.se...@gmail.com
Hi Francesc,
I will try to get some numbers
degrade at all. Memory
consumption is also reasonable.
cheers
Ümit
P.S.: Sorry for writing this mail in this way. However I somehow
didn't get your response directly via mail
On Mon, Jan 16, 2012 at 7:43 PM, Ümit Seren uemit.se...@gmail.com wrote:
I created a hdf5 file with pytables which
I created a hdf5 file with pytables which contains around 29 000
tables with around 31k rows each.
I am trying to create a caching table in the same hdf5 file which
contains a subset of those 29 000 tables.
I wrote a script which basically iterates through each of the 29 000
tables retrieves a
15 matches
Mail list logo