Hi, this has been sent to the pytables list some days ago. Please, note that 
you sohuld subscribe to the list to avoid your messages being rejected.

Cheers,

------------------------ Original message -----------------------------------

From: Jun Li <[EMAIL PROTECTED]>
To: pytables-users@lists.sourceforge.net
Date: Friday 21:27:03
   
Hello, All:

I am using Python 2.4, pytables 1.3, numarray-1.5.1, hdf5-1.6.5 on Linux 
2.4 running on a pretty powerful Dell server.

I have a pytable which has 7 columns and holds roughly 2.6 million rows of 
data.
Here is a my table structure:

class Ttable (IsDescription):
        n_id =  StringCol(length=16,pos=1)
        date = IntCol(pos=2)
        tmax = Float32Col(pos=3)
        tmax_flag = IntCol(pos=4)
        tmin = Float32Col(pos=5)
        tmin_flag = IntCol(pos=6)
        mc = IntCol(pos=7)

I have a little program retrieving data according to some conditions and 
do some calculations or processing with the retrieved data:

code sample:

tbl_T = h5file.root.T_table
num_of_days = int(integertoDate(tbl_T.attrs.endDate).absdays - 
integertoDate(tbl_T.attrs.startDate).absdays)
 
        i = tbl_T.nrows 
        for x in tbl_T :
                if (i%num_of_days) == 0 :
                        n_id = x['n_id']
 

                        numofrows = 0 
                        ct,mc = 0,0
                        t,tx,tn = 0.0,0.0,0.0
                        tnct,txct = 0,0
                        hdd,cdd = 0.0,0.0
                        gd4,gd5 = 0.0,0.0 
                if x['date'] >= startDate :
                        if n_id == x['n_id'] and x['date'] < endDate :
                                if (x['tmax_flag'] and (x['tmax'] < 
maxVal) and (x['tmax'] >= minVal) and
                                        x['tmin_flag'] and (x['tmin'] < 
maxVal) and (x['tmin'] >= minVal)) :
                                        #do something
                                else: 
                                        mc = mc + 1

                                numofrows = numofrows + 1 
 
                if numofrows == nDays :
                        #do other thing


the performance is not very good, far worse than I expected (it roughly 
140 seconds for a run). I found the performance tips with regard to 
indexed searches in the "pyTable's user Guide" manual. So I indexed all 
columns which appears in the selection conditions.
class Ttable (IsDescription):
        n_id =  StringCol(length=16,pos=1, indexed=1)
        date = IntCol(pos=2,indexed=1)
        tmax = Float32Col(pos=3,indexed=1)
        tmax_flag = IntCol(pos=4,indexed=1)
        tmin = Float32Col(pos=5,indexed=1)
        tmin_flag = IntCol(pos=6,indexed=1)
        mc = IntCol(pos=7)

I rebuilt the table and rerun the retrieval program, run-time was almost 
the same, no improve whatsoever. I even tried only index column 'n_id' and 
or 'date' or other combinations of columns but not all columns and re-run 
the program,the same thing happened. Why indexed search has no effect in 
my case?

I read some postings on mail-lists archive. It is said that string index 
search is slower than integer. My 'n_id' column has to be string type. If 
I instead generate Integer ids and feed them to the column(e.g. using 
hash() function) and then index this integer column, does this help 
improve performance?

In my case (as the above code samples shows), are there any other ways to 
improve performance?

Any helps, suggestions and comments are appreciated.

Thanks.

Dave

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to