[PERFORM] loading increase into huge table with 50.000.000 records

2006-07-26 Thread nuggets72
Hello,
Sorry for my poor english,

My problem :

I meet some performance problem during load increase.

massive update of 50.000.000 records and 2.000.000 insert with a weekly
frequency in a huge table (+50.000.000 records, ten fields, 12 Go on hard disk)

current performance obtained : 120 records / s
At the beginning, I got a better speed : 1400 records/s


CPU : bi xeon 2.40GHz (cache de 512KB)
postgresql version : 8.1.4
OS : debian Linux sa 2.6.17-mm2
Hard disk scsi U320 with scsi card U160 on software RAID 1
Memory : only 1 Go at this time.


My database contains less than ten tables. But the main table takes more than 12
Go on harddisk. This table has got ten text records and two date records.

I use few connection on this database.

I try many ideas :
- put severals thousands operations into transaction (with BEGIN and COMMIT)
- modify parameters in postgres.conf like
shared_buffers (several tests with 3 5 75000)
fsync = off
checkpoint_segments = 10 (several tests with 20 - 30)
checkpoint_timeout = 1000 (30-1800)
stats_start_collector = off

unfortunately, I can't use another disk for pg_xlog file.


But I did not obtain a convincing result



My program does some resquest quite simple.
It does some
UPDATE table set dat_update=current_date where id= ;
And if not found
id does some
insert into table


My sysadmin tells me write/read on hard disk aren't the pb (see with iostat)


Have you got some idea to increase performance for my problem ?

Thanks.

Larry.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] loading increase into huge table with 50.000.000 records

2006-07-26 Thread Sven Geisler
Hi Larry,

Do you run vacuum and analyze frequently?
Did you check PowerPostgresql.com for hints about PostgreSQL tuning?


You can increase wal_buffers, checkpoint_segments and checkpoint_timeout
much higher.

Here is a sample which works for me.
wal_buffers = 128
checkpoint_segments = 256
checkpoint_timeout = 3600

Cheers
Sven.


[EMAIL PROTECTED] schrieb:
> Hello,
> Sorry for my poor english,
> 
> My problem :
> 
> I meet some performance problem during load increase.
> 
> massive update of 50.000.000 records and 2.000.000 insert with a weekly
> frequency in a huge table (+50.000.000 records, ten fields, 12 Go on hard 
> disk)
> 
> current performance obtained : 120 records / s
> At the beginning, I got a better speed : 1400 records/s
> 
> 
> CPU : bi xeon 2.40GHz (cache de 512KB)
> postgresql version : 8.1.4
> OS : debian Linux sa 2.6.17-mm2
> Hard disk scsi U320 with scsi card U160 on software RAID 1
> Memory : only 1 Go at this time.
> 
> 
> My database contains less than ten tables. But the main table takes more than 
> 12
> Go on harddisk. This table has got ten text records and two date records.
> 
> I use few connection on this database.
> 
> I try many ideas :
> - put severals thousands operations into transaction (with BEGIN and COMMIT)
> - modify parameters in postgres.conf like
>   shared_buffers (several tests with 3 5 75000)
>   fsync = off
>   checkpoint_segments = 10 (several tests with 20 - 30)
>   checkpoint_timeout = 1000 (30-1800)
>   stats_start_collector = off
> 
>   unfortunately, I can't use another disk for pg_xlog file.
> 
> 
> But I did not obtain a convincing result
> 
> 
> 
> My program does some resquest quite simple.
> It does some
> UPDATE table set dat_update=current_date where id= ;
> And if not found
> id does some
> insert into table
> 
> 
> My sysadmin tells me write/read on hard disk aren't the pb (see with iostat)
> 
> 
> Have you got some idea to increase performance for my problem ?
> 
> Thanks.
> 
> Larry.
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster

-- 
/This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you are not the intended recipient, you should not
copy it, re-transmit it, use it or disclose its contents, but should
return it to the sender immediately and delete your copy from your
system. Thank you for your cooperation./

Sven Geisler <[EMAIL PROTECTED]> Tel +49.30.5362.1627 Fax .1638
Senior Developer,AEC/communications GmbHBerlin,   Germany

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] loading increase into huge table with 50.000.000 records

2006-07-26 Thread Markus Schaber
Hi, Larry,
Hi, Sven,

Sven Geisler wrote:

> You can increase wal_buffers, checkpoint_segments and checkpoint_timeout
> much higher.

You also should increase the free space map settings, it must be large
enough to cope with your weekly bunch.


Markus


-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] loading increase into huge table with 50.000.000 records

2006-07-26 Thread Merlin Moncure

On 7/26/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

Hello,
Sorry for my poor english,

My problem :

I meet some performance problem during load increase.




massive update of 50.000.000 records and 2.000.000 insert with a weekly
frequency in a huge table (+50.000.000 records, ten fields, 12 Go on hard disk)

current performance obtained : 120 records / s
At the beginning, I got a better speed : 1400 records/s


CPU : bi xeon 2.40GHz (cache de 512KB)
postgresql version : 8.1.4
OS : debian Linux sa 2.6.17-mm2
Hard disk scsi U320 with scsi card U160 on software RAID 1
Memory : only 1 Go at this time.


My database contains less than ten tables. But the main table takes more than 12
Go on harddisk. This table has got ten text records and two date records.

I use few connection on this database.

I try many ideas :
- put severals thousands operations into transaction (with BEGIN and COMMIT)
- modify parameters in postgres.conf like
shared_buffers (several tests with 3 5 75000)
fsync = off
checkpoint_segments = 10 (several tests with 20 - 30)
checkpoint_timeout = 1000 (30-1800)
stats_start_collector = off

unfortunately, I can't use another disk for pg_xlog file.


But I did not obtain a convincing result



My program does some resquest quite simple.
It does some
UPDATE table set dat_update=current_date where id= ;
And if not found
id does some
insert into table


My sysadmin tells me write/read on hard disk aren't the pb (see with iostat)


your sysadmin is probably wrong. random query across 50m table on
machine with 1gb memory is going to cause alot of seeking.  take a
look at your pg data folder and you will see it is much larger than
1gb.  a lookup of a cached tuple via a cached index might take 0.2ms,
and might take 200ms if it has to completely to disk on a 50m table.
normal reality is somehwere in between depending on various factors.
my guess is that as you add more memory, the time will drift from the
slow case (120/sec) to the fast case (1400/sec).

you may consider the following alternative:
1. bulk load your 2m update set into scratch table via copy interface
2. update table set dat_update=current_date where table.id=scratch.id
3. insert into table select [...], current_date where not exists
(select id from table where table.id = scratch.id);

you may experiment with boolean form of #3, using 'except' also.
while running these monster queries definately crank up work mem in
expesnse of shared buffers.

merlin


im am guessing you are bottlenecked at the lookup, not the update.  so, if

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Is it possible to speed this query up?

2006-07-26 Thread Arnau

Hi all,

  I execute the following query on postgresql 8.1.0:

SELECT
  u.telephone_number
  , u.telecom_operator_id
  , u.name
FROM
  campanas_subcampaign AS sub
  , agenda_users AS u
  , agenda_users_groups ug
WHERE
  sub.customer_app_config_id = 19362
  AND sub.subcampaign_id = 9723
  AND ug.agenda_user_group_id >= sub.ini_user_group_id
  AND ug.user_id=u.user_id
  AND ug.group_id IN ( SELECT group_id FROM campanas_groups WHERE 
customer_app_config_id = 19362 )

  ORDER BY ug.agenda_user_group_id ASC LIMIT 150

the explain analyze shouts the following:



   Limit  (cost=1.20..4600.56 rows=150 width=74) (actual 
time=76516.312..76853.191 rows=150 loops=1)
   ->  Nested Loop  (cost=1.20..333424.31 rows=10874 width=74) (actual 
time=76516.307..76852.896 rows=150 loops=1)
 ->  Nested Loop  (cost=1.20..299653.89 rows=10874 width=20) 
(actual time=76506.926..76512.608 rows=150 loops=1)
   Join Filter: ("outer".agenda_user_group_id >= 
"inner".ini_user_group_id)
   ->  Nested Loop IN Join  (cost=1.20..189802.77 
rows=32623 width=20) (actual time=75938.659..76353.748 rows=16200 loops=1)

 Join Filter: ("outer".group_id = "inner".group_id)
 ->  Index Scan using pk_agndusrgrp_usergroup on 
agenda_users_groups ug  (cost=0.00..123740.26 rows=2936058 width=30) 
(actual time=0.101..61921.260 rows=2836638 loops=1)
 ->  Materialize  (cost=1.20..1.21 rows=1 width=10) 
(actual time=0.001..0.002 rows=1 loops=2836638)
   ->  Seq Scan on campanas_groups 
(cost=0.00..1.20 rows=1 width=10) (actual time=0.052..0.053 rows=1 loops=1)
 Filter: (customer_app_config_id = 
19362::numeric)
   ->  Index Scan using pk_cmpnssubc_subcmpnid on 
campanas_subcampaign sub  (cost=0.00..3.35 rows=1 width=8) (actual 
time=0.005..0.006 rows=1 loops=16200)

 Index Cond: (subcampaign_id = 9723)
 Filter: (customer_app_config_id = 19362::numeric)
 ->  Index Scan using pk_agenda_uid on agenda_users u 
(cost=0.00..3.09 rows=1 width=78) (actual time=2.262..2.264 rows=1 
loops=150)

   Index Cond: ("outer".user_id = u.user_id)
 Total runtime: 76853.504 ms
(16 rows)



Do you think I could do anything to speed it up?


Cheers!!
--
Arnau

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Is it possible to speed this query up?

2006-07-26 Thread Tom Lane
Arnau <[EMAIL PROTECTED]> writes:
> the explain analyze shouts the following:

The expensive part appears to be this indexscan:

>   ->  Index Scan using pk_agndusrgrp_usergroup on 
> agenda_users_groups ug  (cost=0.00..123740.26 rows=2936058 width=30) 
> (actual time=0.101..61921.260 rows=2836638 loops=1)

Since there's no index condition, the planner is evidently using this
scan just to obtain sort order.  I think ordinarily it would use a
seqscan and then sort the final result, which'd be a lot faster if the
whole result were being selected.  But you have a LIMIT and it's
mistakenly guessing that only a small part of the table will need to be
scanned before the LIMIT is satisfied.

Bottom line: try dropping the LIMIT.  If you really need the limit to be
enforced on the SQL side, you could try declaring the query as a cursor
and only fetching 150 rows from it.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly