Is there any way that you can query and write in much larger chunks that 6?
 I don't know much about postgresql in specific, but in general HDF5 does
much better if you can take larger chunks.  Perhaps you could at least do
the postgresql in parallel.

Be Well
Anthony

On Mon, Mar 19, 2012 at 11:23 AM, sreeaurovindh viswanathan <
sreeaurovi...@gmail.com> wrote:

> The problem is with respect to the writing speed of my computer and the
> postgresql query performance.I will explain the scenario in detail.
>
> I have data about 80 Gb (along with approprite database indexes in place).
> I am trying to read it from Postgresql database and writing it into HDF5
> using Pytables.I have 1 table and 5 variable arrays in one hdf5 file.The
> implementation of Hdf5 is* not* multithreaded or enabled for symmetric
> multi processing.
>
> As for as the postgresql table is concerned the overall record size is 140
> million and I have 5 primary- foreign key referring tables.I am not using
> joins as it is not scalable
>
> So for a single lookup i do 6 lookup without joins and write them into
> hdf5 format. For each lookup i do 6 inserts into each of the table and its
> corresponding arrays.
>
> The queries are really simple
>
> select * from x.train where tr_id=1 (primary key & indexed)
>
>
> select q_t from x.qt where q_id=2 (non-primary key but indexed)
>
>
>  (similarly four queries)
>
> Each computer writes two hdf5 files and hence the total count comes around
> 20 files.
>
> Some Calculations and statistics:
>
> Total number of records : 14,37,00,000
>
>
>
> Total number of records per file : 143700000/20 =71,85,000
>
>
>
> The total number of records in each file : 71,85,000 * 5 = 3,59,25,000
>
>
>  Current Postgresql database config :
>
> My current Machine : 8GB RAM with i7 2nd generation Processor.
>
> I made changes to the following to postgresql configuration file :
> shared_buffers : 2 GB effective_cache_size : 4 GB
>
> Note on current performance:
>
> I have run it for about *ten hours* and the performance is as follows:
> The total number of records written for a single file is about 25,00,000 *
> 5 =1,25,00,000 only. It has written 2 such files .considering the size it
> would take me atleast 20 hrs  for 2 files.I have  about 10 files and hence
> the total hours would be 200 hrs= 9 days. I have to start my experiments as
> early as possible and 10 days is too much. Can you please help me to
> enhance the performance.
>
>  Questions: 1. Should i use Symmetric multi processing on my computer.In
> that case what is suggested or prefereable?  2. Should i use multi
> threading.. In that case any links or pointers would be of great help.
>
>
> Thanks
>
> Sree aurovindh V
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to