On Apr 22, 2011, at 11:57 AM, Abhijit Gadgil wrote:
>
> Thanks for the quick reply, please see below.
>
> On Fri, Apr 22, 2011 at 10:32 PM, Robert Ferrell <ferr...@diablotech.com>
> wrote:
> I add an array of dates to the PyTables file. That array just keeps track of
> which dates are stored. When I append a new date of data I just check
> whether that date is already in the file or not. For me that is simpler than
> involving SQL.
>
> This is something very obvious that I overlooked, still there's a minor
> glitch in using this and that is - the data is not very consistent. Sometimes
> for a date, a given entry is repeated twice. However, such cases too few,
> maybe 5-10 in about 1000-2000 dates. I'll follow this path.
>
> BTW, what I found handy was to put the downloaded data into a particular data
> format, and then the interface for my PyTables file only has to know about
> that data format
>
> Pardon my ignorance, but I've just spent a couple of days with pytables and
> hence I am not aware of many features of pytables, let alone their optimum
> use. Can you please elaborate?
I built a class which includes an interface for putting data into a PyTables
file. The input interface accepts only instances of a particular class (not
exactly true because of inheritance and duck typing).
My data comes from a few different sources. For each source I accumulate the
data, then generate an instance the class that I can input into my PyTables
file. That way I keep the data translation methods out of the data storage
stuff. Also, I sometimes have to clean the data, so I do that (when possible)
before I try to store it.
-robert
>
> Thanks
>
> -abhijit
>
> -r
>
> On Apr 22, 2011, at 10:20 AM, Abhijit Gadgil wrote:
>
> > Dear All,
> >
> > I agree, similar questions would be discussed here many times here on the
> > list. I've some peculiar requirements, which have got mostly to do with the
> > way I am importing data to pytables.
> >
> > I've a csv file of about 2-4k entries generated everyday, which has data
> > for different scrips and such files for about 8-10 years. Initially, when I
> > was not knowing much about pytables, I started with good old sql way and
> > the DB size there is currently about 4G. But with pytables, I expect it to
> > be around 1-2G. So the size of the data is not such a concern as of now,
> > but it'd grow as one goes along. My questions are as follows
> >
> > The way I am structuring the data is -
> >
> > I create a group and in the each group, there's a separate table for each
> > of the instruments in the group and entries for all dates for a given
> > instruments are rows for the table. This looked to me a better choice than
> > having one big table and having all the instruments data as rows. Is this
> > the right choice?
> >
> > The other question is a little important to me - The way I populate the
> > table is I download data everyday and push it to the table. So far so good
> > - my real concern is - it is very likely that I download and push the data
> > for a given date twice in the table. In the sql counterpart, primary keys
> > come to the rescue. But there's no such a choice available with pytables
> > (to the best of my knowledge). Would that have to be handled in the
> > application? One alternative that I am thinking is - actually populating an
> > SQL table along with pytables when inserting the data and SQL table will
> > take care of the integrity part (not clean, but best I can think of as of
> > now). Are there any other alternatives?
> >
> > Thanks
> ------------------------------------------------------------------------------
> Fulfilling the Lean Software Promise
> Lean software platforms are now widely adopted and the benefits have been
> demonstrated beyond question. Learn why your peers are replacing JEE
> containers with lightweight application servers - and what you can gain
> from the move.
> http://p.sf.net/sfu/vmware-sfemails_______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
------------------------------------------------------------------------------
Fulfilling the Lean Software Promise
Lean software platforms are now widely adopted and the benefits have been
demonstrated beyond question. Learn why your peers are replacing JEE
containers with lightweight application servers - and what you can gain
from the move. http://p.sf.net/sfu/vmware-sfemails
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users