My advice regarding parallelization is: do not worry about this *at all* unless you already spent long time profiling your problem and you are sure that parallelizing could be of help. 99% of the time is much more productive focusing on improving serial speed.
Please, try to follow Anthony's suggestion and split your queries in blocks, and pass these blocks to PyTables. That would represent a huge win. For example, use: SELECT * FROM `your_table` LIMIT 0, 10000 for the first block, and send the results to `Table.append`. Then go for the second block as: SELECT * FROM `your_table` LIMIT 10000, 20000 and pass this to `Table.append`. And so on and so forth until you exhaust all the data in your tables. Hope this helps, Francesc On Mar 19, 2012, at 11:36 AM, sreeaurovindh viswanathan wrote: > Hi, > > Thanks for your reply.In that case how will be my querying efficiency? will i > be able to query parrellely?(i.e) will i be able to run multiple queries on a > single file.Also if i do it in 6 chunks will i be able to parrelize it? > > > Thanks > Sree aurovindh Viswanathan > On Mon, Mar 19, 2012 at 10:01 PM, Anthony Scopatz <scop...@gmail.com> wrote: > Is there any way that you can query and write in much larger chunks that 6? > I don't know much about postgresql in specific, but in general HDF5 does much > better if you can take larger chunks. Perhaps you could at least do the > postgresql in parallel. > > Be Well > Anthony > > On Mon, Mar 19, 2012 at 11:23 AM, sreeaurovindh viswanathan > <sreeaurovi...@gmail.com> wrote: > The problem is with respect to the writing speed of my computer and the > postgresql query performance.I will explain the scenario in detail. > > I have data about 80 Gb (along with approprite database indexes in place). I > am trying to read it from Postgresql database and writing it into HDF5 using > Pytables.I have 1 table and 5 variable arrays in one hdf5 file.The > implementation of Hdf5 is not multithreaded or enabled for symmetric multi > processing. > > As for as the postgresql table is concerned the overall record size is 140 > million and I have 5 primary- foreign key referring tables.I am not using > joins as it is not scalable > > So for a single lookup i do 6 lookup without joins and write them into hdf5 > format. For each lookup i do 6 inserts into each of the table and its > corresponding arrays. > > The queries are really simple > > > select * from x.train where tr_id=1 (primary key & indexed) > > > > > > > > > select q_t from x.qt where q_id=2 (non-primary key but indexed) > > > > > > > > > (similarly four queries) > > Each computer writes two hdf5 files and hence the total count comes around 20 > files. > > Some Calculations and statistics: > > > Total number of records : 14,37,00,000 > > > > > > > > Total number > of records per file : 143700000/20 =71,85,000 > > > > > > > > The total number > of records in each file : 71,85,000 * 5 = 3,59,25,000 > > > > > > > > > Current Postgresql database config : > > My current Machine : 8GB RAM with i7 2nd generation Processor. > > I made changes to the following to postgresql configuration file : > shared_buffers : 2 GB effective_cache_size : 4 GB > > Note on current performance: > > I have run it for about ten hours and the performance is as follows: The > total number of records written for a single file is about 25,00,000 * 5 > =1,25,00,000 only. It has written 2 such files .considering the size it would > take me atleast 20 hrs for 2 files.I have about 10 files and hence the > total hours would be 200 hrs= 9 days. I have to start my experiments as early > as possible and 10 days is too much. Can you please help me to enhance the > performance. > > Questions: 1. Should i use Symmetric multi processing on my computer.In that > case what is suggested or prefereable? 2. Should i use multi threading.. In > that case any links or pointers would be of great help. > > > > Thanks > > Sree aurovindh V > > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure_______________________________________________ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users -- Francesc Alted ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users