Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-04 Thread Heikki Linnakangas

andrew klassen wrote:

I am using the c-library interface and for these particular transactions
I preload PREPARE statements. Then as I get requests, I issue a BEGIN, 
followed by at most 300 EXECUTES and then a COMMIT. That is the
general scenario. What value beyond 300 should I try? 


Make sure you use the asynchronous PQsendQuery, instead of plain PQexec. 
Otherwise you'll be doing a round-trip for each EXECUTE anyway 
regardless of the batch size. Of course, if the bottleneck is somewhere 
else, it won't make a difference..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-04 Thread Tom Lane
andrew klassen <[EMAIL PROTECTED]> writes:
> I am using the c-library interface and for these particular transactions
> I preload PREPARE statements. Then as I get requests, I issue a BEGIN, 
> followed by at most 300 EXECUTES and then a COMMIT. That is the
> general scenario. What value beyond 300 should I try? 

Well, you could try numbers in the low thousands, but you'll probably
get only incremental improvement.

> Also, how might COPY (which involves file I/O) improve the 
> above scenario? 

COPY needn't involve file I/O.  If you are using libpq you can push
anything you want into PQputCopyData.  This would involve formatting
the data according to COPY's escaping rules, which are rather different
from straight SQL, but I doubt it'd be a huge amount of code.  Seems
worth trying.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-04 Thread PFC



I am using the c-library interface and for these particular transactions
I preload PREPARE statements. Then as I get requests, I issue a BEGIN,
followed by at most 300 EXECUTES and then a COMMIT. That is the
general scenario. What value beyond 300 should I try?
Thanks.


	Do you have PREPARE statements whose performance might change as the  
table grows ?


	I mean, some selects, etc... in that case if you start with an empty  
table, after inserting say 100K rows you might want to just disconnect,  
reconnect and analyze to trigger replanning of those statements.



Also, how might COPY (which involves file I/O) improve the
above scenario?


	It won't but if you see that COPY is very much faster than your INSERT  
based process it will give you a useful piece of information.


I understand your problem is :

- Create table with indexes
- Insert batches of rows
- After a while it gets slow

Try :

- Create table with indexes
- COPY huge batch of rows
- Compare time with above

	Since COPY also updates the indexes just like your inserts do it will  
tell you if it's the indexes which slow you down or something else.


	Also for insert heavy loads it's a good idea to put the xlog on a  
separate disk (to double your disk bandwidth) unless you have a monster  
disk setup.


	During your INSERTs, do you also make some SELECTs ? Do you have triggers  
on the table ? Foreign keys ? Anything ?
	How much RAM you have ? And can you measure the size of the table+indexes  
when it gets slow ?






- Original Message 
From: James Mansion <[EMAIL PROTECTED]>
To: andrew klassen <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org
Sent: Wednesday, June 4, 2008 3:20:26 PM
Subject: Re: [PERFORM] insert/update tps slow with indices on table > 1M  
rows


andrew klassen wrote:

I'll try adding more threads to update the table as you suggest.

You could try materially increasing the update batch size too.  As an
exercise you could
see what the performance of COPY is by backing out the data and
reloading it from
a suitable file.







--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-04 Thread andrew klassen
I am using the c-library interface and for these particular transactions
I preload PREPARE statements. Then as I get requests, I issue a BEGIN, 
followed by at most 300 EXECUTES and then a COMMIT. That is the
general scenario. What value beyond 300 should I try? 
Also, how might COPY (which involves file I/O) improve the 
above scenario? 
Thanks.


- Original Message 
From: James Mansion <[EMAIL PROTECTED]>
To: andrew klassen <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org
Sent: Wednesday, June 4, 2008 3:20:26 PM
Subject: Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

andrew klassen wrote:
> I'll try adding more threads to update the table as you suggest.
You could try materially increasing the update batch size too.  As an 
exercise you could
see what the performance of COPY is by backing out the data and 
reloading it from
a suitable file.


  

Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-04 Thread James Mansion

andrew klassen wrote:

I'll try adding more threads to update the table as you suggest.
You could try materially increasing the update batch size too.  As an 
exercise you could
see what the performance of COPY is by backing out the data and 
reloading it from

a suitable file.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-04 Thread James Mansion

Matthew Wakeling wrote:
If you're running a "work queue" architecture, that probably means you 
only have one thread doing all the updates/inserts? It might be worth 
going multi-threaded, and issuing inserts and updates through more 
than one connection. Postgres is designed pretty well to scale 
performance by the number of simultaneous connections.
That would explain a disappointing upper limit on insert rate, but not 
any sort of cliff for the rate.  Nor, really, any material slowdown, if 
the single thread implies that we're stuck on round trip latency as a 
material limiting factor.


James


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-04 Thread andrew klassen

I agree that the discs are probably very busy. I do have 2 disks but they are
for redundancy. Would it help to put the data, indexes and xlog on separate 
disk partitions? 
I'll try adding more threads to update the table as you suggest.

- Original Message 
From: Matthew Wakeling <[EMAIL PROTECTED]>
To: pgsql-performance@postgresql.org
Sent: Wednesday, June 4, 2008 10:10:38 AM
Subject: Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

On Wed, 4 Jun 2008, andrew klassen wrote:
> I am using multiple threads, but only one worker thread for insert/updated to 
> this table.
> I don't mind trying to add multiple threads for this table, but my guess is 
> it would not
> help because basically the overall tps rate is decreasing so dramatically. 
> Since
> the cpu time consumed by the corresponding postgres server process for my 
> thread is
> small it does not seem to be the bottleneck. There has to be a bottleneck 
> somewhere else.
> Do you agree or is there some flaw in my reasoning?

There is indeed a flaw in your reasoning - there may be very little CPU 
time consumed, but that just indicates that the discs are busy. Getting 
Postgres to do multiple things at once will cause a more efficient use of 
the disc subsystem, resulting in greater overall throughput. This is 
especially the case if you have multiple discs in your box.

Matthew

-- 
Contrary to popular belief, Unix is user friendly. It just happens to be
very selective about who its friends are.                -- Kyle Hearn
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



  

Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-04 Thread Matthew Wakeling

On Wed, 4 Jun 2008, andrew klassen wrote:

I am using multiple threads, but only one worker thread for insert/updated to 
this table.
I don't mind trying to add multiple threads for this table, but my guess is it 
would not
help because basically the overall tps rate is decreasing so dramatically. Since
the cpu time consumed by the corresponding postgres server process for my 
thread is
small it does not seem to be the bottleneck. There has to be a bottleneck 
somewhere else.
Do you agree or is there some flaw in my reasoning?


There is indeed a flaw in your reasoning - there may be very little CPU 
time consumed, but that just indicates that the discs are busy. Getting 
Postgres to do multiple things at once will cause a more efficient use of 
the disc subsystem, resulting in greater overall throughput. This is 
especially the case if you have multiple discs in your box.


Matthew

--
Contrary to popular belief, Unix is user friendly. It just happens to be
very selective about who its friends are. -- Kyle Hearn
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-04 Thread andrew klassen
I am using multiple threads, but only one worker thread for insert/updated to 
this table.
I don't mind trying to add multiple threads for this table, but my guess is it 
would not 
help because basically the overall tps rate is decreasing so dramatically. Since
the cpu time consumed by the corresponding postgres server process for my 
thread is
small it does not seem to be the bottleneck. There has to be a bottleneck 
somewhere else. 
Do you agree or is there some flaw in my reasoning?

- Original Message 
From: Matthew Wakeling <[EMAIL PROTECTED]>
To: andrew klassen <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org
Sent: Wednesday, June 4, 2008 5:31:22 AM
Subject: Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

On Tue, 3 Jun 2008, andrew klassen wrote:
> Basically, I have a somewhat constant rate of inserts/updates that go 
> into a work queue and then get passed to postgres.

> The cpu load is not that high, i.e. plenty of idle cpu. I am running an older
> version of freebsd and the iostat output is not very detailed.

If you're running a "work queue" architecture, that probably means you 
only have one thread doing all the updates/inserts? It might be worth 
going multi-threaded, and issuing inserts and updates through more than 
one connection. Postgres is designed pretty well to scale performance by 
the number of simultaneous connections.

Matthew

-- 
Contrary to popular belief, Unix is user friendly. It just happens to be
very selective about who its friends are.                -- Kyle Hearn
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



  

Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-04 Thread Matthew Wakeling

On Tue, 3 Jun 2008, andrew klassen wrote:
Basically, I have a somewhat constant rate of inserts/updates that go 
into a work queue and then get passed to postgres.



The cpu load is not that high, i.e. plenty of idle cpu. I am running an older
version of freebsd and the iostat output is not very detailed.


If you're running a "work queue" architecture, that probably means you 
only have one thread doing all the updates/inserts? It might be worth 
going multi-threaded, and issuing inserts and updates through more than 
one connection. Postgres is designed pretty well to scale performance by 
the number of simultaneous connections.


Matthew

--
Contrary to popular belief, Unix is user friendly. It just happens to be
very selective about who its friends are. -- Kyle Hearn
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-03 Thread Scott Marlowe
On Tue, Jun 3, 2008 at 4:36 PM, andrew klassen <[EMAIL PROTECTED]> wrote:

> The cpu load is not that high, i.e. plenty of idle cpu. I am running an
> older
> version of freebsd and the iostat output is not very detailed.
> During this time, the number is low < 10Mbs. The system has an
> LSI Logic MegaRAID controller with 2 disks.
>
> Any ideas on how to find the bottleneck/decrease overhead of index usage.

Older versions of BSD can be pretty pokey compared to the 6.x and 7.x
branches.  I seriously consider upgrading to 7 if possible.

The cost of maintaining indexes is always an issue.  There are a few
things you can do to help out.

Partitioning and having fewer indexes are what I'd recommend.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-03 Thread andrew klassen
I am not currently using copy, but  I am using prepared statements  
for table insert/updates so the overhead for the actual data transfer 
should be pretty good. I am sending at most  300 inserts/updates 
per transaction, but that is just an arbitrary value. When the queue 
grows, I could easily send more per transaction. I  did experiment 
a little, but it did not seem to help significantly at the time.
 
The system has 4G total memory. Shared memory is locked by the OS,
i.e. not paged so I am only using shared_buffers=28MB.
 
The maximum data per row is 324 bytes assuming maximum expected length of two 
text fields. There are 5 total indices: 1 8-byte, 2 4-byte and 2 text fields. 
As mentioned all indices are btree.
 

 
- Original Message 
From: PFC <[EMAIL PROTECTED]>
To: andrew klassen <[EMAIL PROTECTED]>; pgsql-performance@postgresql.org
Sent: Tuesday, June 3, 2008 7:15:10 PM
Subject: Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

On Wed, 04 Jun 2008 00:36:09 +0200, andrew klassen <[EMAIL PROTECTED]>  
wrote:

> Running postgres 8.2.5
>  
> I have a table that has 5 indices, no foreign keys or any
> dependency on any other table. If delete the database and
> start entering entries, everything works very well until I get
> to some point (let's say 1M rows). Basically, I have a somewhat
> constant rate of inserts/updates that go into a work queue and then
> get passed to postgres. The work queue starts filling up as the
> responsiveness slows down. For example at 1.5M
> rows it takes >2 seconds for 300 inserts issued in one transaction.
>  
> Prior to this point I had added regular VACUUM ANALYZE on
> the table and it did help.  I increased maintenance work memory to
> 128M. I also set the fillfactor on the table indices to 50% (not sure
> if that made any difference have to study results more closely). 
>  
> In an effort to figure out the bottleneck, I DROPed 4 of the indices
> on the table and the tps increased to over 1000. I don't really know
> which index removal gave the best performance improvement. I
> dropped 2 32-bit indices and 2 text indices which all using btree.
>  
> The cpu load is not that high, i.e. plenty of idle cpu. I am running an  
> older
> version of freebsd and the iostat output is not very detailed.
> During this time, the number is low < 10Mbs. The system has an
> LSI Logic MegaRAID controller with 2 disks.
>  
> Any ideas on how to find the bottleneck/decrease overhead of index usage.
>  
> Thanks.

    If you are filling an empty table it is generally faster to create the  
indexes after the data import.
    Of course if this is a live table or you need the indexes during the  
import, this is not an option.
    I find it generally faster to lightly preprocess the data and generate  
text files that I then import using COPY, then doing the rest of the  
processing in SQL.

    How much RAM in the box ? size of the data & indexes ?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



  

Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-03 Thread PFC
On Wed, 04 Jun 2008 00:36:09 +0200, andrew klassen <[EMAIL PROTECTED]>  
wrote:



Running postgres 8.2.5
 
I have a table that has 5 indices, no foreign keys or any
dependency on any other table. If delete the database and
start entering entries, everything works very well until I get
to some point (let's say 1M rows). Basically, I have a somewhat
constant rate of inserts/updates that go into a work queue and then
get passed to postgres. The work queue starts filling up as the
responsiveness slows down. For example at 1.5M
rows it takes >2 seconds for 300 inserts issued in one transaction.
 
Prior to this point I had added regular VACUUM ANALYZE on
the table and it did help.  I increased maintenance work memory to
128M. I also set the fillfactor on the table indices to 50% (not sure
if that made any difference have to study results more closely). 
 
In an effort to figure out the bottleneck, I DROPed 4 of the indices
on the table and the tps increased to over 1000. I don't really know
which index removal gave the best performance improvement. I
dropped 2 32-bit indices and 2 text indices which all using btree.
 
The cpu load is not that high, i.e. plenty of idle cpu. I am running an  
older

version of freebsd and the iostat output is not very detailed.
During this time, the number is low < 10Mbs. The system has an
LSI Logic MegaRAID controller with 2 disks.
 
Any ideas on how to find the bottleneck/decrease overhead of index usage.
 
Thanks.


	If you are filling an empty table it is generally faster to create the  
indexes after the data import.
	Of course if this is a live table or you need the indexes during the  
import, this is not an option.
	I find it generally faster to lightly preprocess the data and generate  
text files that I then import using COPY, then doing the rest of the  
processing in SQL.


How much RAM in the box ? size of the data & indexes ?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-03 Thread andrew klassen
Running postgres 8.2.5
 
I have a table that has 5 indices, no foreign keys or any 
dependency on any other table. If delete the database and 
start entering entries, everything works very well until I get
to some point (let's say 1M rows). Basically, I have a somewhat
constant rate of inserts/updates that go into a work queue and then
get passed to postgres. The work queue starts filling up as the
responsiveness slows down. For example at 1.5M 
rows it takes >2 seconds for 300 inserts issued in one transaction. 
 
Prior to this point I had added regular VACUUM ANALYZE on 
the table and it did help.  I increased maintenance work memory to 
128M. I also set the fillfactor on the table indices to 50% (not sure 
if that made any difference have to study results more closely).  
 
In an effort to figure out the bottleneck, I DROPed 4 of the indices 
on the table and the tps increased to over 1000. I don't really know 
which index removal gave the best performance improvement. I 
dropped 2 32-bit indices and 2 text indices which all using btree. 
 
The cpu load is not that high, i.e. plenty of idle cpu. I am running an older
version of freebsd and the iostat output is not very detailed.
During this time, the number is low < 10Mbs. The system has an 
LSI Logic MegaRAID controller with 2 disks.
 
Any ideas on how to find the bottleneck/decrease overhead of index usage. 
 
Thanks.