I really like this way about going about it; however, would it be better to
use the built in hierarchy for separation of the tables or to write to
separate hdf5 files? When I am currently experimenting with concurrent
read/write operations to a shared hdf5 file w/o hierarchy, I notice that
the only errors that I get are occasional read errors (which isn't much of
a problem for me), so I am thinking. Could there be a way to reduce the
metadata within an hdf5 and at the same time, use a multi-tabled approach
to solve my problem?
Thanks,
Jacob
On Wed, Jul 18, 2012 at 1:22 AM, Ümit Seren <uemit.se...@gmail.com> wrote:
> Just to add what Anthony said:
> In the end it also depends how unrelated your data is and how you want
> to access it. If the access scenaria is that you usually only search
> or select within a specific dataset then splitting up the datasets and
> putting them into separate tables is the way to go. In RBDMS terms
> this is btw called sharding.
> I have such a use case where I do have around 30000 datasets (each of
> them with around 5 million rows). I am only interested in one dataset
> at a time. So I created 30.000 tables. It works really good.
> And in case you want to access the data across the datasets (for
> aggregating or calculating averages) you can take a MapReduce approach
> which should work very well with this approach.
>
>
> On Tue, Jul 17, 2012 at 11:55 PM, Jacob Bennett
> <jacob.bennet...@gmail.com> wrote:
> > Thanks for the input Anthony!
> >
> > -Jake
> >
> >
> > On Tue, Jul 17, 2012 at 4:20 PM, Anthony Scopatz <scop...@gmail.com>
> wrote:
> >>
> >> On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett <
> jacob.bennet...@gmail.com>
> >> wrote:
> >>>
> >>> Hello PyTables Users & Contributors,
> >>>
> >>> Just a quick question, let's say that I have certain identifiers that
> >>> link to a set of data. Would it generally be faster for lookup to have
> each
> >>> set a data as a separate table with an id as the tables name or to add
> this
> >>> id as another column to a universal table of data and then let the
> in-kernel
> >>> search query data only with a specific id?
> >>
> >>
> >> I think that in general it is faster to have more tables with ids as
> >> names. For very small data, searching through a single larger table
> might
> >> be quicker than node access...but even then I doubt it.
> >>
> >>>
> >>> I hope you can understand my question would 1,000 tables of 100,000
> >>> records each be better for searching than 1 table with 100 million
> records
> >>> and one extra id column?
> >>
> >>
> >> For these data sizes more tables is probably faster.
> >>
> >> (It should also be noted that in the more tables case, that data is
> >> actually smaller, because you can eliminate the id column.)
> >>
> >> Be Well
> >> Anthony
> >>
> >>>
> >>>
> >>> Thanks,
> >>> Jacob Bennett
> >>>
> >>> --
> >>> Jacob Bennett
> >>> Massachusetts Institute of Technology
> >>> Department of Electrical Engineering and Computer Science
> >>> Class of 2014| benne...@mit.edu
> >>>
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Live Security Virtual Conference
> >>> Exclusive live event will cover all the ways today's security and
> >>> threat landscape has changed and how IT managers can respond.
> Discussions
> >>> will include endpoint security, mobile security and the latest in
> malware
> >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >>> _______________________________________________
> >>> Pytables-users mailing list
> >>> Pytables-users@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >>>
> >>
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Live Security Virtual Conference
> >> Exclusive live event will cover all the ways today's security and
> >> threat landscape has changed and how IT managers can respond.
> Discussions
> >> will include endpoint security, mobile security and the latest in
> malware
> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> _______________________________________________
> >> Pytables-users mailing list
> >> Pytables-users@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >>
> >
> >
> >
> > --
> > Jacob Bennett
> > Massachusetts Institute of Technology
> > Department of Electrical Engineering and Computer Science
> > Class of 2014| benne...@mit.edu
> >
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > Pytables-users mailing list
> > Pytables-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
--
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| benne...@mit.edu
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users