Re: [Pytables-users] Pytable data retrieval performance

Francesc Altet Wed, 11 Oct 2006 00:37:12 -0700

El dt 10 de 10 del 2006 a les 16:31 -0500, en/na Jun Li va escriure:
> I figured out another way to do the job. Note: my table has 2.6
> million rows with 7 columns. 
> 
> src_rows = tbl_T.read(start=0, stop=tbl_T.nrows) 
> 
> The above statement will read the data of the whole table to the
> memory? am I right? src_rows will be an recorded array or nested
> recorded array right?


That's right, but it is not necessary to load completely the table in
memory for doing what you want. See later.

> Then I figured out the correct indices of the piece of data I need and
> slicing them out and then do query on that chunk. 
> index1 = sp + int(sdate.absdays -
> integertoDate(tbl_attr_sdate).absdays) 
> index2 = sp + int(edate.absdays -
> integertoDate(tbl_attr_sdate).absdays) 
> txd = src_rows.field('tmax')[index1:index2] 

in such a case, you can avoid reading completely your table by using:

tbl_T.read(start=index1, stop=index2, field="tmax")

or better yet:

tbl_T.cols.tmax[index1:index2]

which is a shorthand of the former. With this, PyTables will only read
the part of the table that you are interested in.

> In this way, the performance improved considerably (from 140 s to 60
> s). However, the test performance is not in consistency, I test a lot
> of times. The run-time just switched from roughly 60 s to 110 s back
> and forth like randomly! is there any buffer or memory issue with
> read()? what' going on here? 

I don't know for sure, but completely reading the table in-memory will
create memory contention issues (and possibly will make your system to
swap intensively). Please, try out my suggestion and see if it cures
this.

> Another questions, Is there any function or quick way to INSERT a new
> row (rows) in the middle of an existing table or anywhere in the
> table. I know append() only append row(s) at the end of table and
> modifyRow() only modify existing row(s). Is pytable has such a
> function or if not, it is really handy to have this kind of basic
> function (consider adding to the new version maybe?) 

No, because it is more oriented to access to mostly read-only files,
PyTables doesn't have a quick way to insert a row (the only way is to
insert it in the middle of a copy of the entire table). If you need this
capability, perhaps you should use a relational database better.

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Pytable data retrieval performance

Reply via email to