Re: [Pytables-users] Improving my write speed

Anthony Scopatz Mon, 19 Mar 2012 09:47:01 -0700

What Francesc said ;)

On Mon, Mar 19, 2012 at 11:43 AM, Francesc Alted <[email protected]> wrote:


> My advice regarding parallelization is: do not worry about this *at all*
> unless you already spent long time profiling your problem and you are sure
> that parallelizing could be of help.  99% of the time is much more
> productive focusing on improving serial speed.
>
> Please, try to follow Anthony's suggestion and split your queries in
> blocks, and pass these blocks to PyTables.  That would represent a huge
> win.  For example, use:
>
> SELECT * FROM `your_table` LIMIT 0, 10000
>
> for the first block, and send the results to `Table.append`.  Then go for
> the second block as:
>
> SELECT * FROM `your_table` LIMIT 10000, 20000
>
> and pass this to `Table.append`.  And so on and so forth until you exhaust
> all the data in your tables.
>
> Hope this helps,
>
> Francesc
>
> On Mar 19, 2012, at 11:36 AM, sreeaurovindh viswanathan wrote:
>
> > Hi,
> >
> > Thanks for your reply.In that case how will be my querying efficiency?
> will i be able to query parrellely?(i.e) will i be able to run multiple
> queries on a single file.Also if i do it in 6 chunks will i be able to
> parrelize it?
> >
> >
> > Thanks
> > Sree aurovindh Viswanathan
> > On Mon, Mar 19, 2012 at 10:01 PM, Anthony Scopatz <[email protected]>
> wrote:
> > Is there any way that you can query and write in much larger chunks that
> 6?  I don't know much about postgresql in specific, but in general HDF5
> does much better if you can take larger chunks.  Perhaps you could at least
> do the postgresql in parallel.
> >
> > Be Well
> > Anthony
> >
> > On Mon, Mar 19, 2012 at 11:23 AM, sreeaurovindh viswanathan <
> [email protected]> wrote:
> > The problem is with respect to the writing speed of my computer and the
> postgresql query performance.I will explain the scenario in detail.
> >
> > I have data about 80 Gb (along with approprite database indexes in
> place). I am trying to read it from Postgresql database and writing it into
> HDF5 using Pytables.I have 1 table and 5 variable arrays in one hdf5
> file.The implementation of Hdf5 is not multithreaded or enabled for
> symmetric multi processing.
> >
> > As for as the postgresql table is concerned the overall record size is
> 140 million and I have 5 primary- foreign key referring tables.I am not
> using joins as it is not scalable
> >
> > So for a single lookup i do 6 lookup without joins and write them into
> hdf5 format. For each lookup i do 6 inserts into each of the table and its
> corresponding arrays.
> >
> > The queries are really simple
> >
> >
> > select * from x.train where tr_id=1 (primary key & indexed)
> >
> >
> >
> >
> >
> >
> >
> >
> > select q_t from x.qt where q_id=2 (non-primary key but indexed)
> >
> >
> >
> >
> >
> >
> >
> >
> > (similarly four queries)
> >
> > Each computer writes two hdf5 files and hence the total count comes
> around 20 files.
> >
> > Some Calculations and statistics:
> >
> >
> > Total number of records : 14,37,00,000
> >
> >
> >
> >
> >
> >
> >
> > Total number
> > of records per file : 143700000/20 =71,85,000
> >
> >
> >
> >
> >
> >
> >
> > The total number
> > of records in each file : 71,85,000 * 5 = 3,59,25,000
> >
> >
> >
> >
> >
> >
> >
> >
> > Current Postgresql database config :
> >
> > My current Machine : 8GB RAM with i7 2nd generation Processor.
> >
> > I made changes to the following to postgresql configuration file :
> shared_buffers : 2 GB effective_cache_size : 4 GB
> >
> > Note on current performance:
> >
> > I have run it for about ten hours and the performance is as follows: The
> total number of records written for a single file is about 25,00,000 * 5
> =1,25,00,000 only. It has written 2 such files .considering the size it
> would take me atleast 20 hrs  for 2 files.I have  about 10 files and hence
> the total hours would be 200 hrs= 9 days. I have to start my experiments as
> early as possible and 10 days is too much. Can you please help me to
> enhance the performance.
> >
> >  Questions: 1. Should i use Symmetric multi processing on my computer.In
> that case what is suggested or prefereable?  2. Should i use multi
> threading.. In that case any links or pointers would be of great help.
> >
> >
> >
> > Thanks
> >
> > Sree aurovindh V
> >
> >
> >
> ------------------------------------------------------------------------------
> > This SF email is sponsosred by:
> > Try Windows Azure free for 90 days Click Here
> > http://p.sf.net/sfu/sfd2d-msazure
> > _______________________________________________
> > Pytables-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > This SF email is sponsosred by:
> > Try Windows Azure free for 90 days Click Here
> > http://p.sf.net/sfu/sfd2d-msazure
> > _______________________________________________
> > Pytables-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> >
> ------------------------------------------------------------------------------
> > This SF email is sponsosred by:
> > Try Windows Azure free for 90 days Click Here
> >
> http://p.sf.net/sfu/sfd2d-msazure_______________________________________________
> > Pytables-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>
> -- Francesc Alted
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Pytables-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure

_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Improving my write speed

Reply via email to