Re: [PERFORM] really quick multiple inserts can use COPY?

2006-12-12 Thread nicky





Jens Schipkowski wrote:

Thanks a lot to all for your tips.

Of course, I am doing all the INSERTs using a transaction. So the cost 
per INSERT dropped from 30 ms to 3 ms.

The improvement factor matches with the hint by Brian Hurt.
Sorry, I forgot to mention we are using PostgreSQL 8.1.4.
Thanks for the code snippet posted by mallah. It looks like you are 
using prepared statements, which are not available to us.
But I will check our database access if its possible to do a 
workaround, because this looks clean and quick to me.


regards
Jens Schipkowski


On Mon, 11 Dec 2006 17:53:52 +0100, Guillaume Cottenceau <[EMAIL PROTECTED]> 
wrote:



"Jens Schipkowski"  writes:


Hello!

In our JAVA application we do multiple inserts to a table by data from
a  Hash Map. Due to poor database access implemention - done by
another  company (we got the job to enhance the software) - we cannot
use prepared  statements. (We are not allowed to change code at
database access!)
First, we tried to fire one INSERT statement per record to insert.
This  costs 3 ms per row which is to slow because normally we insert
10.000  records which results in 30.000 ms just for inserts.

for(){
sql = "INSERT INTO tblfoo(foo,bar) 
VALUES("+it.next()+","+CONST.BAR+");";

}


You should try to wrap that into a single transaction. PostgreSQL
waits for I/O write completion for each INSERT as it's
implicitely in its own transaction. Maybe the added performance
would be satisfactory for you.





--**
APUS Software GmbH

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate






This link might be what you are looking for, it has some information 
about implementing COPY in the JDBC driver. Check the reply message as 
well.


http://archives.postgresql.org/pgsql-jdbc/2005-04/msg00134.php


Another solution might be to have Java dump the contents of the HashMap 
to a CVS file and have it load through psql with COPY commands.


Good luck,

Nick


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Speeding up query, Joining 55mil and 43mil records.

2006-06-21 Thread nicky




Hello People, 

I'm trying to solve a 'what i feel is a'
performance/configuration/query error on my side. I'm fairly new to
configuring PostgreSQL so, i might be completely wrong with my
configuration. 

My database consists of 44 tables, about 20GB. Two of those tables are
'big/huge'. Table src.src_faktuur_verricht contains 43million records
(9GB) and table src.src_faktuur_verrsec contains 55million records
(6GB). 

Below is the 'slow' query. 

INSERT INTO rpt.rpt_verrichting
(verrichting_id
,verrichting_secid
,fout_status
,patientnr
,verrichtingsdatum
,locatie_code
,afdeling_code
,uitvoerder_code
,aanvrager_code
,verrichting_code
,dbcnr
,aantal_uitgevoerd
,kostenplaats_code
,vc_patientnr
,vc_verrichting_code
,vc_dbcnr
)
SELECT  t1.id
,   t0.secid
,   t1.status
,   t1.patientnr
,   t1.datum
,   t1.locatie
,   t1.afdeling
,   t1.uitvoerder
,   t1.aanvrager
,   t0.code
,   t1.casenr
,   t0.aantal
,   t0.kostplaats
,   null
,   null
,   null
FROM    src.src_faktuur_verrsec t0 JOIN
    src.src_faktuur_verricht t1 ON
    t0.id = t1.id
WHERE   substr(t0.code,1,2) not in ('14','15','16','17')
AND (substr(t0.correctie,4,1) <> '1' OR t0.correctie is null)
AND EXTRACT(YEAR from t1.datum) > 2004;


Output from explain

Hash Join  (cost=1328360.12..6167462.76 rows=7197568 width=118)
  Hash Cond: (("outer".id)::text = ("inner".id)::text)

  ->  Seq Scan on src_faktuur_verrsec t0  (cost=0.00..2773789.90
rows=40902852 width=52)
    Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND
(substr((code)::text, 1, 2) <> '15'::text) AND
(substr((code)::text, 1, 2) <> '16'::text) AND
(substr((code)::text, 1, 2) <> '17'::text) AND
((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS
NULL)))
  ->  Hash  (cost=1188102.97..1188102.97 rows=8942863 width=80)
    ->  Bitmap Heap Scan on src_faktuur_verricht t1 
(cost=62392.02..1188102.97 rows=8942863 width=80)
  Recheck Cond: (date_part('year'::text, datum) >
2004::double precision)
  ->  Bitmap Index Scan on src_faktuur_verricht_idx1 
(cost=0.00..62392.02 rows=8942863 width=0)
    Index Cond: (date_part('year'::text, datum) >
2004::double precision)


The db server runs PostgreSQL 8.1.4 on FreeBSD 6.1-Stable. 2GB of RAM. 
It contains two SATA150 disks, one contains PostgreSQL and the rest of
the operating system and the other disk holds the pg_xlog directory.

Changed lines from my postgresql.conf file

shared_buffers = 8192
temp_buffers = 4096
work_mem = 65536
maintenance_work_mem = 1048576
max_fsm_pages = 4
fsync = off
wal_buffers = 64
effective_cache_size = 174848

The query above takes around 42 minutes. 

However, i also have a wimpy desktop machine with 1gb ram. Windows with
MSSQL 2000 (default installation), same database structure, same
indexes, same query, etc and it takes 17 minutes. The big difference
makes me think that i've made an error with my PostgreSQL
configuration. I just can't seem to figure it out. 

Could someone perhaps give me some pointers, advice?

Thanks in advance. 

Nicky









Re: [PERFORM] Speeding up query, Joining 55mil and 43mil records.

2006-06-22 Thread nicky
=8761024 loops=1)
 ->  Bitmap Heap Scan on src_faktuur_verricht t1  
(cost=62392.02..1188102.97 rows=8942863 width=14) (actual 
time=74713.092..216206.478 rows=8761024 loops=1)
   Recheck Cond: (date_part('year'::text, datum) > 
2004::double precision)
   ->  Bitmap Index Scan on src_faktuur_verricht_idx1  
(cost=0.00..62392.02 rows=8942863 width=0) (actual 
time=73892.153..73892.153 rows=8761024 loops=1)
 Index Cond: (date_part('year'::text, datum) > 
2004::double precision)

Total runtime: 631994.172 ms

A lot of improvement also in the select count: 33 minutes vs 10 minutes.


To us, the speeds are good. Very happy with the performance increase on 
that select with join, since 90% of the queries are SELECT based.


The query results in 7551616 records, so that's about 4500 inserts per 
second. I'm not sure if that is fast or not. Any further tips would be 
welcome.


Thanks everyone.
Nicky

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Speeding up query, Joining 55mil and 43mil records.

2006-06-22 Thread nicky

Hello Sven,

We have the following indexes on src_faktuur_verrsec
/
   CREATE INDEX src_faktuur_verrsec_idx0
 ON src.src_faktuur_verrsec
 USING btree
 (id);

   CREATE INDEX src_faktuur_verrsec_idx1
 ON src.src_faktuur_verrsec
 USING btree
 (substr(code::text, 1, 2));

   CREATE INDEX src_faktuur_verrsec_idx2
 ON src.src_faktuur_verrsec
 USING btree
 (substr(correctie::text, 4, 1));/

and another two on src_faktuur_verricht

/CREATE INDEX src_faktuur_verricht_idx0
 ON src.src_faktuur_verricht
 USING btree
 (id);

   CREATE INDEX src_faktuur_verricht_idx1
 ON src.src_faktuur_verricht
 USING btree
 (date_part('year'::text, datum))
 TABLESPACE src_index;/

PostgreSQL elects not to use them. I assume, because it most likely 
needs to traverse the entire table anyway.


if i change: /  substr(t0.code,1,2) not in 
('14','15','16','17')/

to (removing the NOT): /substr(t0.code,1,2) in ('14','15','16','17')/

it uses the index, but it's not the query that needs to be run anymore.

Greetings,
Nick




Sven Geisler wrote:

Hi Nicky,

Did you tried to create an index to avoid the sequential scans?

Seq Scan on src_faktuur_verrsec t0...

I think, you should try

CREATE INDEX src.src_faktuur_verrsec_codesubstr ON 
src.src_faktuur_verrsec (substr(src.src_faktuur_verrsec.code,1,2))


Cheers
Sven.

nicky schrieb:

Hello again,

thanks for all the quick replies.

It seems i wasn't entirely correct on my previous post, i've mixed up 
some times/numbers.


Below the correct numbers

MSSQL:  SELECT COUNT(*) from JOIN (without insert)   17 minutes
PostgreSQL: SELECT COUNT(*) from JOIN (without insert)   33 minutes
PostgreSQL: complete query   55 minutes


 


A lot of improvement also in the select count: 33 minutes vs 10 minutes.


To us, the speeds are good. Very happy with the performance increase 
on that select with join, since 90% of the queries are SELECT based.


The query results in 7551616 records, so that's about 4500 inserts 
per second. I'm not sure if that is fast or not. Any further tips 
would be welcome.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Out of Memory Problem.

2006-07-12 Thread nicky

Hello Everyone,

I'm trying to find out/understand what causes my 'out of memory' error. 
I do not have enough experience with such logs to understand what is 
wrong or how to fix it. So i hope someone can point me in the right 
direction.


The 'rpt.rpt_verrichting' table contains about 8.5 million records and 
the 'rpt.rpt_dbc_traject' table contains 700k records.

It's part of a nightly process, so there is only 1 user active.

The server PostgreSQL 8.1.4 is running on, has 4GB Ram, OS FreeBSD 
6.1-Stable.


postgresql.conf
shared_buffers = 8192
work_mem = 524288
maintenance_work_mem = 524288
effective_cache_size = 104858


Resource limits (current):
 cputime  infinity secs
 filesize infinity kB
 datasize  1048576 kB  < could this be a problem?
 stacksize  131072 kB  < could this be a problem?
 coredumpsize infinity kB
 memoryuseinfinity kB
 memorylocked infinity kB
 maxprocesses 5547
 openfiles   11095
 sbsize   infinity bytes
 vmemoryuse   infinity kB


Thanks in advance.

_*The Query that is causing the out of memory error.*_
LOG:  statement: insert into rpt.rpt_verrichting_dbc
   (
  verrichting_id
   ,  verrichting_secid
   ,  dbcnr
   ,  vc_dbcnr
   )
   select
 t1.verrichting_id
   , t1.verrichting_secid
   , t1.dbcnr
   , max(t1.vc_dbcnr) as vc_dbcnr
   from
 rpt.rpt_verrichting t1
   , rpt.rpt_dbc_traject t00
   where
 t1.vc_patientnr = t00.vc_patientnr
   and
 t1.vc_agb_specialisme_nr_toek = t00.agb_specialisme_nr
   and
 t1.verrichtingsdatum between t00.begindat_dbc and 
COALESCE(t00.einddat_dbc, t00.begindat_dbc + interval '365 days')

   group by
 t1.verrichting_id
   , t1.verrichting_secid
   , t1.dbcnr
   ;

_*An EXPLAIN for the query:*_
Subquery Scan "*SELECT*"  (cost=1837154.04..1839811.72 rows=106307 
width=74)

  ->  HashAggregate  (cost=1837154.04..1838482.88 rows=106307 width=56)
->  Merge Join  (cost=1668759.55..1836090.97 rows=106307 width=56)
  Merge Cond: ((("outer".vc_patientnr)::text = 
"inner"."?column8?") AND ("outer".agb_specialisme_nr = 
"inner".vc_agb_specialisme_nr_toek))
  Join Filter: (("inner".verrichtingsdatum >= 
"outer".begindat_dbc) AND ("inner".verrichtingsdatum <= 
COALESCE("outer".einddat_dbc, ("outer".begindat_dbc + '365 
days'::interval
  ->  Index Scan using rpt_dbc_traject_idx1 on 
rpt_dbc_traject t00  (cost=0.00..84556.01 rows=578274 width=37)

  ->  Sort  (cost=1668759.55..1689806.46 rows=8418765 width=79)
Sort Key: (t1.vc_patientnr)::text, 
t1.vc_agb_specialisme_nr_toek
->  Seq Scan on rpt_verrichting t1  
(cost=0.00..302720.65 rows=8418765 width=79)


_*Out of memory log.*_
TopMemoryContext: 16384 total in 2 blocks; 3824 free (4 chunks); 12560 used
Type information cache: 8192 total in 1 blocks; 1864 free (0 chunks); 
6328 used
Operator class cache: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 
used
TopTransactionContext: 8192 total in 1 blocks; 7856 free (0 chunks); 336 
used

MessageContext: 122880 total in 4 blocks; 64568 free (4 chunks); 58312 used
smgr relation table: 8192 total in 1 blocks; 2872 free (0 chunks); 5320 used
Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 896 free (0 chunks); 128 used
ExecutorState: 8192 total in 1 blocks; 5304 free (1 chunks); 2888 used
ExecutorState: 562316108 total in 94 blocks; 528452720 free (2593154 
chunks); 33863388 used

ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
AggContext: 399499264 total in 58 blocks; 5928 free (110 chunks); 
399493336 used
TupleHashTable: 109109272 total in 23 blocks; 2468576 free (70 chunks); 
106640696 used

ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks); 4816 used
CacheMemoryContext: 516096 total in 6 blocks; 83448 free (0 chunks); 
432648 used

rpt_dbc_traject_idx1: 1024 total in 1 blocks; 328 free (0 chunks); 696 used
rpt_dbc_traject_pk: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
rpt_verrichting_idx2: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
rpt_verrichting_idx1: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_index_indrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 
632 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 328 free 

Re: [PERFORM] Opteron vs. Xeon "benchmark"

2006-09-22 Thread nicky

Hello Hannes,

The text above the pictures on page 13. Translated in my crappy english.

The confrontation between the Opteron and Woodcrest was inevitable in 
this article, but who can add 1 and 1 should have known from the 
previous two pages that it doesn't look that good for AMD . Under loads 
of 25 till 100 simultaneous visitors, the Xeon performs 24% better with 
MSQL 4.1.20, 30% better in MySQL 5.0.20a and 37% better in PostgreSQL 
8.2-dev. In short, the Socket F Opteron doesn't stand a chance, although 
the Woodcrest scales better and has such a high startpoint with one 
core, there is no chance of beating it. We can imagine that the Opteron 
with more memory and production hardware, would be a few % faster, but 
the difference with the Woodcrest is that high that we have a hard time 
believing that the complete picture would change that much.



Regards,
Nick

Hannes Dorbath wrote:

A colleague pointed me to this site tomorrow:

http://tweakers.net/reviews/642/13

I can't read the language, so can't get a grip on what exactly the 
"benchmark" was about.


Their diagrams show `Request per seconds'. What should that mean? How 
many connections PG accepted per second? So they measured the OS fork 
performance? Should that value be of any interrest? Anyone with heavy 
OLTP workload will use persistent connections or a connection pool in 
front.


Do they mean TPS? That woulnd't make much sense in a CPU benchmark, as 
OLTP workload is typically limited by the disc subsystem.


Can someone enlighten me what this site is about?




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings