[PERFORM] Improve BULK insertion

2004-12-04 Thread Grupos
Hi !
I need to insert 500.000 records on a table frequently. It´s a bulk 
insertion from my applicatoin.
I am with a very poor performance. PostgreSQL insert very fast until the 
tuple 200.000 and after it the insertion starts to be really slow.
I am seeing on the log and there is a lot of transaction logs, something 
like :

2004-12-04 11:08:59 LOG:  recycled transaction log file 00060012
2004-12-04 11:08:59 LOG:  recycled transaction log file 00060013
2004-12-04 11:08:59 LOG:  recycled transaction log file 00060011
2004-12-04 11:14:04 LOG:  recycled transaction log file 00060015
2004-12-04 11:14:04 LOG:  recycled transaction log file 00060014
2004-12-04 11:19:08 LOG:  recycled transaction log file 00060016
2004-12-04 11:19:08 LOG:  recycled transaction log file 00060017
2004-12-04 11:24:10 LOG:  recycled transaction log file 00060018
How can I configure PostgreSQL to have a better performance on this bulk 
insertions ? I already increased the memory values.

My data:
Conectiva linux kernel 2.6.9
PostgreSQL 7.4.6 - 1,5gb memory
max_connections = 30
shared_buffers = 3
sort_mem = 32768
vacuum_mem = 32768
max_fsm_pages = 3
max_fsm_relations = 1500
The other configurations are default.
Cheers,
Rodrigo Carvalhaes  


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Improve BULK insertion

2004-12-04 Thread Christopher Browne
In the last exciting episode, [EMAIL PROTECTED] (Grupos) wrote:
 Hi !

 I need to insert 500.000 records on a table frequently. It´s a bulk
 insertion from my applicatoin.
 I am with a very poor performance. PostgreSQL insert very fast until
 the tuple 200.000 and after it the insertion starts to be really slow.
 I am seeing on the log and there is a lot of transaction logs,
 something like :

 2004-12-04 11:08:59 LOG:  recycled transaction log file 00060012
 2004-12-04 11:08:59 LOG:  recycled transaction log file 00060013
 2004-12-04 11:08:59 LOG:  recycled transaction log file 00060011
 2004-12-04 11:14:04 LOG:  recycled transaction log file 00060015
 2004-12-04 11:14:04 LOG:  recycled transaction log file 00060014
 2004-12-04 11:19:08 LOG:  recycled transaction log file 00060016
 2004-12-04 11:19:08 LOG:  recycled transaction log file 00060017
 2004-12-04 11:24:10 LOG:  recycled transaction log file 00060018

It is entirely normal for there to be a lot of transaction log file
recycling when bulk inserts are taking place; that goes through a lot
of transaction logs.

 How can I configure PostgreSQL to have a better performance on this
 bulk insertions ? I already increased the memory values.

Memory is, as likely as not, NOT the issue.

Two questions:

 1.  How are you doing the inserts?  Via INSERT statements?  Or
 via COPY statements?  What sort of transaction grouping
 is involved?

 COPY is way faster than INSERT, and grouping plenty of updates
 into a single transaction is generally a win.

 2.  What is the schema like?  Does the table have a foreign key
 constraint?  Does it have a bunch of indices?

 If there should eventually be lots of indices, it tends to be
 faster to create the table with none/minimal indices, and add
 indexes afterwards, as long as your load process can be trusted
 to not break unique constraints...

 If there is some secondary table with a foreign key constraint,
 and _that_ table is growing, it is possible that a sequential
 scan is being used to search the secondary table where, if you
 did an ANALYZE on that table, an index scan would be preferred
 once it grew to larger size...

There isn't a particular reason for PostgreSQL to hit a wall upon
seeing 200K records; I and coworkers routinely load database dumps
that have millions of (sometimes pretty fat) records, and they don't
choke.  That's true whether talking about loading things onto my
(somewhat wimpy) desktop PC, or a SMP Xeon system with a small RAID
array, or higher end stuff involving high end SMP and EMC disk arrays.
The latter obviously being orders of magnitude faster than desktop
equipment :-).
-- 
(format nil [EMAIL PROTECTED] cbbrowne acm.org)
http://www3.sympatico.ca/cbbrowne/unix.html
Rules of the  Evil Overlord #207. Employees will  have conjugal visit
trailers which  they may use provided  they call in  a replacement and
sign out on  the timesheet. Given this, anyone caught  making out in a
closet  while  leaving  their   station  unmonitored  will  be  shot.
http://www.eviloverlord.com/

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Improve BULK insertion

2004-12-04 Thread Patrick Hatcher
I do mass inserts daily into PG. I drop the all indexes except my primary key and then use the COPY FROM command. This usually takes less than 30 seconds. I spend more time waiting for indexes to recreate.PatrickHatcherMacys.Com[EMAIL PROTECTED] wrote: -To: [EMAIL PROTECTED]From: Christopher Browne [EMAIL PROTECTED]Sent by: [EMAIL PROTECTED]Date: 2004-12-04 06:48AMSubject: Re: [PERFORM] Improve BULK insertionIn the last exciting episode, [EMAIL PROTECTED] (Grupos) wrote: Hi ! I need to insert 500.000 records on a table frequently. It´s a bulk insertion from my applicatoin. I am with a very poor performance. PostgreSQL insert very fast until the tuple 200.000 and after it the insertion starts to be really slow. I am seeing on the log and there is a lot of transaction logs, something like : 2004-12-04 11:08:59 LOG: recycled transaction log file "00060012" 2004-12-04 11:08:59 LOG: recycled transaction log file "00060013" 2004-12-04 11:08:59 LOG: recycled transaction log file "00060011" 2004-12-04 11:14:04 LOG: recycled transaction log file "00060015" 2004-12-04 11:14:04 LOG: recycled transaction log file "00060014" 2004-12-04 11:19:08 LOG: recycled transaction log file "00060016" 2004-12-04 11:19:08 LOG: recycled transaction log file "00060017" 2004-12-04 11:24:10 LOG: recycled transaction log file "00060018"It is entirely normal for there to be a lot of transaction log filerecycling when bulk inserts are taking place; that goes through a lotof transaction logs. How can I configure PostgreSQL to have a better performance on this bulk insertions ? I already increased the memory values.Memory is, as likely as not, NOT the issue.Two questions: 1. How are you doing the inserts? Via INSERT statements? Or   via COPY statements? What sort of transaction grouping   is involved?   COPY is way faster than INSERT, and grouping plenty of updates   into a single transaction is generally a "win." 2. What is the schema like? Does the table have a foreign key   constraint? Does it have a bunch of indices?   If there should eventually be lots of indices, it tends to be   faster to create the table with none/minimal indices, and add   indexes afterwards, as long as your "load" process can be trusted   to not break "unique" constraints...   If there is some secondary table with a foreign key constraint,   and _that_ table is growing, it is possible that a sequential   scan is being used to search the secondary table where, if you   did an ANALYZE on that table, an index scan would be preferred   once it grew to larger size...There isn't a particular reason for PostgreSQL to "hit a wall" uponseeing 200K records; I and coworkers routinely load database dumpsthat have millions of (sometimes pretty fat) records, and they don't"choke." That's true whether talking about loading things onto my(somewhat wimpy) desktop PC, or a SMP Xeon system with a small RAIDarray, or higher end stuff involving high end SMP and EMC disk arrays.The latter obviously being orders of magnitude faster than desktopequipment :-).-- (format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org")http://www3.sympatico.ca/cbbrowne/unix.htmlRules of the Evil Overlord #207. "Employees will have conjugal visittrailers which they may use provided they call in a replacement andsign out on the timesheet. Given this, anyone caught making out in acloset while leaving their  station unmonitored will be shot."http://www.eviloverlord.com/---(end of broadcast)---TIP 7: don't forget to increase your free space map settings

[PERFORM] lock problem

2004-12-04 Thread sarlav kumar
Hi All,Thanks for the information on replication tools!!

Now, I have a question regarding locking tablesand updating tables that have a relationship to the locked table.I opened up two pgsql windows logged in using same userid.Let's say I lock a table "customerdata" on one window.begin;lock table customerdata;Then in the other window,I want to make an update to table "customer".begin;update customer set status=0 where id=111;The relation ship between the two tables is as followscustomerdata.uid is FK on customer.id. There are no triggers that will try to update customerdata table when the above update statementis issued.My problem is the update does not continue unless the lock on customerdata is released. Is it because the lock statement does a lock on all related tables? Is it possible to lock only the particular table we want to lock and not the related tables?Any help would be appreciated. Thanks in advance.Thanks,Saranya
		Do you Yahoo!? 
Yahoo! Mail - Easier than ever with enhanced search. Learn more.

Re: [PERFORM] Alternatives to Dell?

2004-12-04 Thread Cott Lang
On Fri, 2004-12-03 at 20:53 -0500, Mike Rylander wrote:
 On Fri, 03 Dec 2004 06:38:50 -0800, Joshua D. Drake
 [EMAIL PROTECTED] wrote:
  That's true :) One of the reasons the compaq's are expensive
  is they supposedly use a quad board, even for the dual machine.
  Which means a different opteron chip as well.
 
 I can confirm that.  You have a choice of CPUs, but all the DL585s are
 expandable to 4 procs if you get the 800 series Opterons.  Each CPU
 sits on it's own daughter board that links up the HyperTransport
 busses between all the others.  Each CPU card has (I think...) 8 slots
 for DIMMS, for a max of 64G.

Why would I want that giant beast when a 1U will do for dual
opterons? :)

The V40zs have dual procs on the main board with a daughter board for
the other two procs. Each CPU has 4 DIMM slots. Sun has the daughter
boards for an outrageous price, but you can buy white box Newisys
daughter boards for a lot less.

The 64GB of 2GB DIMMs I am jealous of, other than that, the DL585 is so
outrageously priced I never considered it. 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] lock problem

2004-12-04 Thread Stephan Szabo
On Sat, 4 Dec 2004, sarlav kumar wrote:

 Thanks for the information on replication tools!!
  Now, I have a question regarding locking tables and updating tables
 that have a relationship to the locked table.

 I opened up two pgsql windows logged in using same userid.
 Let's say I lock a table customerdata on one window.
 begin;
 lock table customerdata;

 Then in the other window,I want to make an update to table customer.
 begin;
 update customer set status=0 where id=111;

 The relation ship between the two tables is as follows
 customerdata.uid is FK on customer.id. There are no triggers that will
 try to update customerdata table when the above update statement is
 issued.

 My problem is the update does not continue unless the lock on
 customerdata is released. Is it because the lock statement does a lock
 on all related tables? Is it possible to lock only the particular table
 we want to lock and not the related tables?

The no action foreign key triggers grab a Row Share on the referencing
table which conflicts with the Exclusive lock that LOCK TABLE takes by
default. Depending on what you're trying to prevent, you may be able to
ask lock table for a lesser lock (see the list and descriptions here:
http://www.postgresql.org/docs/7.4/static/explicit-locking.html#LOCKING-TABLES
).

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Improve BULK insertion

2004-12-04 Thread Josh Berkus
Rodrigo,

 I need to insert 500.000 records on a table frequently. It´s a bulk
 insertion from my applicatoin.
 I am with a very poor performance. PostgreSQL insert very fast until the
 tuple 200.000 and after it the insertion starts to be really slow.
 I am seeing on the log and there is a lot of transaction logs, something

In addition to what Chris Browne asked:
What's your transaction log setup?   Are your database transaction logs on a 
seperate disk resource?   What is checkpoint_segments and checkpoint_timeout 
set to?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Partitioned table performance

2004-12-04 Thread Stacy White
We're working with a Postgres database that includes a fairly large table
(100M rows, increasing at around 2M per day).

In some cases we've seen some increased performance in tests by splitting
the table into several smaller tables.  Both 'UNION ALL' views, and the
superclass/subclass scheme work well at pruning down the set of rows a query
uses, but they seem to introduce a large performance hit to the time to
process each row (~50% for superclass/subclass, and ~150% for union views).

Is this to be expected?  Or is this a problem with our test setup?

I've listed details on our tests at the end of this message.  The results
are similar with our larger tables; the overhead appears to be per record
returned from the subquery/subclass; it's not a constant overhead per query.
Our production instance is running 7.4.2, but the results are the same on
8.0.

For reference, I tested with this setup (for the superclass/subclass
partitioning scheme):

   CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
   ANALYZE super_foo ;

   CREATE TABLE sub_foo1 () INHERITS ( super_foo );
   INSERT INTO sub_foo1 VALUES ( 1, 1 );
   -- repeat insert until sub_foo1 has 1,000,000 rows
   CREATE INDEX idx_subfoo1_partition ON sub_foo1 ( partition );
   ANALYZE sub_foo1 ;

   CREATE TABLE sub_foo2 () INHERITS ( super_foo );
   INSERT INTO sub_foo2 VALUES ( 2, 1 );
   -- repeat insert until sub_foo2 has 1,000,000 rows
   CREATE INDEX idx_subfoo2_partition ON sub_foo2 ( partition );
   ANALYZE sub_foo2 ;

and this setup for the union all scheme:

   CREATE TABLE union_foo1 ( bar NUMERIC );
   INSERT INTO union_foo1 VALUES ( 1 ) ;
   -- repeat insert until union_foo1 has 1,000,000 rows
   ANALYZE union_foo1 ;

   CREATE TABLE union_foo2 ( bar NUMERIC );
   INSERT INTO union_foo2 VALUES ( 1 ) ;
   -- repeat insert until union_foo2 has 1,000,000 rows
   ANALYZE union_foo2 ;

   CREATE VIEW union_foo AS
  SELECT 1 AS partition, * FROM union_foo1
  UNION ALL
  SELECT 2 AS partition, * FROM union_foo2 ;

The partition pruning works marvelously:

  EXPLAIN SELECT SUM(bar) FROM super_foo WHERE partition = 2 ;
   QUERY PLAN
 ---
--
 Aggregate  (cost=21899.02..21899.02 rows=1 width=32)
   -  Append  (cost=0.00..19399.01 rows=102 width=32)
 -  Seq Scan on super_foo  (cost=0.00..0.00 rows=1 width=32)
   Filter: (partition = 2::numeric)
 -  Index Scan using idx_subfoo1_partition on sub_foo1 super_foo
(cost=0.00..2.01 rows=1 width=10)
   Index Cond: (partition = 2::numeric)
 -  Seq Scan on sub_foo2 super_foo  (cost=0.00..19397.00
rows=100 width=10)
   Filter: (partition = 2::numeric)

and

  EXPLAIN SELECT SUM(bar) FROM union_foo WHERE partition = 2 ;
 QUERY PLAN
 ---
--
 Aggregate  (cost=75819.15..75819.15 rows=1 width=32)
   -  Subquery Scan union_foo  (cost=0.00..70818.60 rows=2000220 width=32)
 -  Append  (cost=0.00..50816.40 rows=2000220 width=10)
   -  Subquery Scan *SELECT* 1  (cost=0.00..25408.20
rows=1000110 width=10)
 -  Result  (cost=0.00..15407.10 rows=1000110 width=10)
   One-Time Filter: false
   -  Seq Scan on union_foo1  (cost=0.00..15407.10
rows=1000110 width=10)
   -  Subquery Scan *SELECT* 2  (cost=0.00..25408.20
rows=1000110 width=10)
 -  Seq Scan on union_foo2  (cost=0.00..15407.10
rows=1000110 width=10)


But you can see a fair amount of overhead, espcially in the case of the
union view:

  SELECT SUM(bar) FROM sub_foo1 UNION ALL SELECT SUM(bar) FROM sub_foo2 ;
  Time: 2291.637 ms

  SELECT SUM(bar) FROM union_foo1 UNION ALL SELECT SUM(bar) FROM union_foo2
;
  Time: 2248.225 ms

  SELECT SUM(bar) FROM super_foo ;
  Time: 3329.953 ms

  SELECT SUM(bar) FROM union_foo ;
  Time: 5267.742 ms


  SELECT SUM(bar) FROM sub_foo2 ;
  Time: 1124.496 ms

  SELECT SUM(bar) FROM union_foo2 ;
  Time: 1090.616 ms

  SELECT SUM(bar) FROM super_foo WHERE partition = 2 ;
  Time: 2137.767 ms

  SELECT SUM(bar) FROM union_foo WHERE partition = 2 ;
  Time: 2774.887 ms


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster