Re: [PERFORM] Insert performance, what should I expect?

2004-10-23 Thread Gaetano Mendola
Brock Henry wrote:
 Any comments/suggestions would be appreciated.
Tune also the disk I/O elevator.
look at this: http://www.varlena.com/varlena/GeneralBits/49.php
Regards
Gaetano Mendola

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Insert performance, what should I expect?

2004-10-23 Thread Steinar H. Gunderson
On Sat, Oct 23, 2004 at 12:31:32PM +0200, Gaetano Mendola wrote:
 Any comments/suggestions would be appreciated.
 Tune also the disk I/O elevator.
 
 look at this: http://www.varlena.com/varlena/GeneralBits/49.php

Mm, interesting. I've heard somewhere that the best for database-like loads
on Linux is to disable the anticipatory I/O scheduler
(http://kerneltrap.org/node/view/567), which should probably
influence the numbers for elvtune also -- anybody know whether this is true
or not for PostgreSQL?

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Insert performance, what should I expect?

2004-10-20 Thread Andrew McMillan
On Wed, 2004-10-20 at 11:53 +1000, Brock Henry wrote:
 
 Test 1, For each import, I'm dropping all indexes and pkeys/fkeys,
 then importing, then adding keys and indexes. Then I've got successive
 runs. I figure the reindexing will get more expensive as the database
 grows?

Sounds like the right approach to me, if the tables are empty before the
import.


 Successive Imports: 44,49,50,57,55,61,72 (seconds)
 = average 1051inserts/second (which now that I've written this seems
 fairly good)

(A) Are you doing the whole thing inside a transaction?  This will be
significantly quicker.  COPY would probably be quicker still, but the
biggest difference will be a single transaction.

(B) If you are starting with empty files, are you ensuring that the dead
records are vacuumed before you start?  I would recommend a vacuum
full on the affected tables prior to the first import run (i.e. when
the tables are empty).  This is likely to be the reason that the timing
on your successive imports increases so much.



 sort_mem = 4096   

You probably want to increase this - if you have 1G of RAM then there is
probably some spare.  But if you actually expect to use 32 connections
then 32 * 4M = 128M might mean a careful calculation is needed.  If you
are really only likely to have 1-2 connections running concurrently then
increase it to (e.g.) 32768.

 max_fsm_relations = 300  

If you do a vacuum full verbose; the last line will give you some
clues as to what to set this (and max_fsm_pages) too.


 effective_cache_size = 16000

16000 * 8k = 128M seems low for a 1G machine - probably you could say
64000 without fear of being wrong.  What does free show as cached?
Depending on how dedicated the machine is to the database, the effective
cache size may be as much as 80-90% of that.


 Can I expect it to go faster than this? I'll see where I can make my
 script itself go faster, but I don't think I'll be able to do much.
 I'll do some pre-prepare type stuff, but I don't expect significant
 gains, maybe 5-10%. I'd could happily turn off fsync for this job, but
 not for some other databases the server is hosting.

You can probably double the speed - maybe more.

Cheers,
Andrew,
-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
  How many things I can do without! -- Socrates
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Insert performance, what should I expect?

2004-10-20 Thread Robert Creager
When grilled further on (Tue, 19 Oct 2004 22:12:28 -0400),
Rod Taylor [EMAIL PROTECTED] confessed:

  I've done some manual benchmarking running my script 'time script.pl'
  I realise my script uses some of the time, bench marking shows that
  %50 of the time is spent in dbd:execute.
  
 1) Drop DBD::Pg and switch to the Pg driver for Perl instead (non-DBI
 compliant) which has functions similar to putline() that allow COPY to
 be used.

COPY can be used with DBD::Pg, per a script I use:

$dbh-do( COPY temp_obs_$band ( $col_list ) FROM stdin );
$dbh-func( join ( \t, @data ) . \n, 'putline' );
$dbh-func( \\.\n, 'putline' );
$dbh-func( 'endcopy' );

With sets of data from 1000 to 8000 records, my COPY performance is consistent
at ~1 records per second.

Cheers,
Rob

-- 
 10:39:31 up 2 days, 16:25,  2 users,  load average: 2.15, 2.77, 3.06
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgp1zoDKUrU9u.pgp
Description: PGP signature


Re: [PERFORM] Insert performance, what should I expect?

2004-10-20 Thread Rod Taylor
On Wed, 2004-10-20 at 12:45, Robert Creager wrote:
 When grilled further on (Tue, 19 Oct 2004 22:12:28 -0400),
 Rod Taylor [EMAIL PROTECTED] confessed:
 
   I've done some manual benchmarking running my script 'time script.pl'
   I realise my script uses some of the time, bench marking shows that
   %50 of the time is spent in dbd:execute.
   
  1) Drop DBD::Pg and switch to the Pg driver for Perl instead (non-DBI
  compliant) which has functions similar to putline() that allow COPY to
  be used.
 
 COPY can be used with DBD::Pg, per a script I use:
 
 $dbh-do( COPY temp_obs_$band ( $col_list ) FROM stdin );
 $dbh-func( join ( \t, @data ) . \n, 'putline' );
 $dbh-func( \\.\n, 'putline' );
 $dbh-func( 'endcopy' );

Thanks for that. All of the conversations I've seen on the subject
stated that DBD::Pg only supported standard DB features -- copy not
amongst them.

 With sets of data from 1000 to 8000 records, my COPY performance is consistent
 at ~1 records per second.

Well done.



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Insert performance, what should I expect?

2004-10-19 Thread Brock Henry
Hi, 

I've after some opinions about insert performance.

I'm importing a file with 13,002 lines to a database that ends up with
75,703 records across 6 tables. This is a partial file  the real data
is 4 files with total lines 95174. I'll be loading these files each
morning, and then running a number of queries on them.

The select queries run fast enough, (mostly - 2 queries are slow but
I'll look into that later), but importing is slower than I'd like it
to be, but I'm wondering what to expect?

I've done some manual benchmarking running my script 'time script.pl'
I realise my script uses some of the time, bench marking shows that
%50 of the time is spent in dbd:execute.

Test 1, For each import, I'm dropping all indexes and pkeys/fkeys,
then importing, then adding keys and indexes. Then I've got successive
runs. I figure the reindexing will get more expensive as the database
grows?

Successive Imports: 44,49,50,57,55,61,72 (seconds)
= average 1051inserts/second (which now that I've written this seems
fairly good)

Test 2, no dropping etc of indexes, just INSERTs
Import  61, 62, 73, 68, 78, 74 (seconds)
= average 1091 inserts/second

Machine is Linux 2.6.4, 1GB RAM, 3.something GHz XEON processor, SCSI
hdd's (raid1). PostgreSQL 7.4.2. Lightly loaded machine, not doing
much other than my script. Script and DB on same machine.

Sysctl a | grep shm
kernel.shmmni = 4096
kernel.shmall = 134217728 (pages or bytes? Anyway)
kernel.shmmax = 134217728

postgresql.conf
tcpip_socket = true
max_connections = 32
superuser_reserved_connections = 2
shared_buffers = 8192   
sort_mem = 4096   
vacuum_mem = 16384  
max_fsm_relations = 300  
fsync = true
wal_buffers = 64
checkpoint_segments = 10 
effective_cache_size = 16000
syslog = 1  
silent_mode = false  
log_connections = true
log_pid = true
log_timestamp = true
stats_start_collector = true
stats_row_level = true

Can I expect it to go faster than this? I'll see where I can make my
script itself go faster, but I don't think I'll be able to do much.
I'll do some pre-prepare type stuff, but I don't expect significant
gains, maybe 5-10%. I'd could happily turn off fsync for this job, but
not for some other databases the server is hosting.

Any comments/suggestions would be appreciated.

Thanks :)

Brock Henry

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Insert performance, what should I expect?

2004-10-19 Thread Rod Taylor
 I've done some manual benchmarking running my script 'time script.pl'
 I realise my script uses some of the time, bench marking shows that
 %50 of the time is spent in dbd:execute.

The perl drivers don't currently use database level prepared statements
which would give a small boost.

But your best bet is to switch to using COPY instead of INSERT. Two ways
to do this.

1) Drop DBD::Pg and switch to the Pg driver for Perl instead (non-DBI
compliant) which has functions similar to putline() that allow COPY to
be used.

2) Have your perl script output a .sql file with the data prepared (COPY
statements) which you feed into the database via psql.

You can probably achieve a 50% increase in throughput.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match