Hi Anthony,
Thanks for the response.But I have some difficulty in throwing everything
to a single table and require your help for the same.
Please consider the following :
The table is in the following structure.
1) adSuggester table:
trId long
click int
queryId long
2)query table
queryid long
querytoken long.
As per my problem, a query id will have many query tokens.So the a sample
of the values in the query table will have (1,234),(1,235),(1,236),(1,237)
and queryid is not a primary key here.Basically i mean to say that i need a
Variable size array inside a table.
1) Atleast as per my (limited)knowledge with Pytables Variable array is
not possible to be kept in the table and the alternate would be to go for a
fixed sized array.Is fixed array an efficient way .If i have to use a
variable array how should i reference it for every row in the table.I am
afraid on the quering speed.
2) Also i have the problem in accessing the columns of the fixed array in
tables.If i use the following syntax
In table (pytable)declaration : ADCcount = UInt16Col(shape=(4)) with
the table name as particle.
and then if i access the element as particle['ADCcount'][1] = 6 .The
value of 6 is not stored in the pytable at all. However it compiles and
runs without any errors.please help me on this also.
3) I am new to NumPy also but I have a question regarding the same. After
this I am planning to do all my mathematical (and statistical analysis)
using numPy. In that case will a conversion be necessary or how should i
have the structure of the table (in Pytables) so that i don't encounter
those problems.
Sorry for the long list of questions.Kindly help me for the same.
Thanks you
Sree aurovindh V
On Sat, Mar 17, 2012 at 1:49 AM, Anthony Scopatz <scop...@gmail.com> wrote:
> Hello Sree,
>
> Sorry for the slow response.
>
> On Thu, Mar 15, 2012 at 10:56 PM, sreeaurovindh viswanathan <
> sreeaurovi...@gmail.com> wrote:
>
>> Hi,
>>
>> I have created five tables in a Hdf5 file.I have created index during the
>> creation of the file.I have about 140 million records in my postgresql
>> database.I am trying to divide it into 20 hdf5 chunks.The problem is that i
>> have one master table which has relationships with other tables.
>>
>
> As a rule, joining is always expensive. (It is expensive in SQL as well.)
> A more HDF-ish way of doing things would be to throw all of the data in a
> single large table and not have the master table if you don't need it.
>
>
>> After I insert a record into the master table i have to verify whether
>> there exists a record in the child table with the key that is present in
>> the master table.If it exists i have to ignore them.Otherwise I have to
>> insert them.I have written the code for the same which is given below. I
>> believe the bottleneck is with respect to the Pytable query that i have
>> written.It parses the entire set of records in order get if the id
>> exists.I would like to terminate the querying process after i get the
>> first occourence of the id and i do not know how to do it.kindly help me
>> on this
>>
>
> You can use the slice syntax on where(),
> http://pytables.github.com/usersguide/libref.html?highlight=index#tables.Table.where,
> ie the start, stop, and step keywords, to make a sliding search. Such a
> search will query in smaller chunks and would quit after the first chuck
> with a hit. For example for chunk sizes of 10000:
>
> i = 0
> csize = 10000
> query = []
> while 0 == len(query):
> query = [row for row in table.where("a =- b", start=i*csize,
> stop=(i+1)*csize + 1)]
> i +=1
> query = query[0]
>
> This might need some tuning in terms of how large csize should be based on
> how large your table is. But this should be faster on average. You could
> also use more sophisticated search mechanisms if the location of a query is
> related to that of queries before it in any way.
>
>
>> The quertNecess is the list that i populate after querying the entire
>> pytable.Please suggest me on how to optimize the performance.Also can you
>> please highlight whether auto indexing will happen when each time a record
>> is inserted .
>>
>
> Yes it should if autoIndex on the table itself is True:
> http://pytables.github.com/usersguide/libref.html?highlight=index#tables.Table.autoIndex
>
> Be Well
> Anthony
>
>
>>
>> A note on current performance:
>> we have a computer with core i7 processor and 8 GB of RAM.All the 8
>> threads run at full capacity with about 7.15 GB of RAM .It has written
>> about 1736340(approx) including all tables after 28 hrs.I have started all
>> 20 python scripts running in parallel to fill the tables.
>>
>> Thanks
>> sree aurovindh V
>>
>>
>> The below is the table structure:
>>
>> class adSuggester(IsDescription):
>> trId = UInt64Col(pos=0)
>> click=UInt16Col(pos=1)
>> queryId=UInt32Col(pos=8)
>>
>> class queryToken(IsDescription):
>> qId=UInt32Col()
>> qTok=UInt32Col()
>>
>> table.cols.queryId.createIndex()
>>
>> squrery="qId=="+str(trainVals[8])
>> queryNecess=[row['qId'] for row in queryTable.where(squrery)]
>> if not queryNecess:
>> selectQueryTr="select query_token from kdd.query_tokens where
>> query_id="
>> selectQueryTr+=str(trainVals[8])
>> cur.execute(selectQueryTr)
>> allQueryTokens=cur.fetchall() # db quering on the postgres
>> and gets all the values.
>> for queryT in allQueryTokens: # insert into pytables
>> queryToken['qId']=trainVals[8]
>> queryToken['qTok']=queryT[0]
>> queryToken.append()
>>
>>
>> ------------------------------------------------------------------------------
>> This SF email is sponsosred by:
>> Try Windows Azure free for 90 days Click Here
>> http://p.sf.net/sfu/sfd2d-msazure
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users