Re: [PERFORM] really quick multiple inserts can use COPY?
Jens Schipkowski wrote: Thanks a lot to all for your tips. Of course, I am doing all the INSERTs using a transaction. So the cost per INSERT dropped from 30 ms to 3 ms. The improvement factor matches with the hint by Brian Hurt. Sorry, I forgot to mention we are using PostgreSQL 8.1.4. Thanks for the code snippet posted by mallah. It looks like you are using prepared statements, which are not available to us. But I will check our database access if its possible to do a workaround, because this looks clean and quick to me. regards Jens Schipkowski On Mon, 11 Dec 2006 17:53:52 +0100, Guillaume Cottenceau <[EMAIL PROTECTED]> wrote: "Jens Schipkowski" writes: Hello! In our JAVA application we do multiple inserts to a table by data from a Hash Map. Due to poor database access implemention - done by another company (we got the job to enhance the software) - we cannot use prepared statements. (We are not allowed to change code at database access!) First, we tried to fire one INSERT statement per record to insert. This costs 3 ms per row which is to slow because normally we insert 10.000 records which results in 30.000 ms just for inserts. for(){ sql = "INSERT INTO tblfoo(foo,bar) VALUES("+it.next()+","+CONST.BAR+");"; } You should try to wrap that into a single transaction. PostgreSQL waits for I/O write completion for each INSERT as it's implicitely in its own transaction. Maybe the added performance would be satisfactory for you. --** APUS Software GmbH ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate This link might be what you are looking for, it has some information about implementing COPY in the JDBC driver. Check the reply message as well. http://archives.postgresql.org/pgsql-jdbc/2005-04/msg00134.php Another solution might be to have Java dump the contents of the HashMap to a CVS file and have it load through psql with COPY commands. Good luck, Nick ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Speeding up query, Joining 55mil and 43mil records.
Hello People, I'm trying to solve a 'what i feel is a' performance/configuration/query error on my side. I'm fairly new to configuring PostgreSQL so, i might be completely wrong with my configuration. My database consists of 44 tables, about 20GB. Two of those tables are 'big/huge'. Table src.src_faktuur_verricht contains 43million records (9GB) and table src.src_faktuur_verrsec contains 55million records (6GB). Below is the 'slow' query. INSERT INTO rpt.rpt_verrichting (verrichting_id ,verrichting_secid ,fout_status ,patientnr ,verrichtingsdatum ,locatie_code ,afdeling_code ,uitvoerder_code ,aanvrager_code ,verrichting_code ,dbcnr ,aantal_uitgevoerd ,kostenplaats_code ,vc_patientnr ,vc_verrichting_code ,vc_dbcnr ) SELECT t1.id , t0.secid , t1.status , t1.patientnr , t1.datum , t1.locatie , t1.afdeling , t1.uitvoerder , t1.aanvrager , t0.code , t1.casenr , t0.aantal , t0.kostplaats , null , null , null FROM src.src_faktuur_verrsec t0 JOIN src.src_faktuur_verricht t1 ON t0.id = t1.id WHERE substr(t0.code,1,2) not in ('14','15','16','17') AND (substr(t0.correctie,4,1) <> '1' OR t0.correctie is null) AND EXTRACT(YEAR from t1.datum) > 2004; Output from explain Hash Join (cost=1328360.12..6167462.76 rows=7197568 width=118) Hash Cond: (("outer".id)::text = ("inner".id)::text) -> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 rows=40902852 width=52) Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL))) -> Hash (cost=1188102.97..1188102.97 rows=8942863 width=80) -> Bitmap Heap Scan on src_faktuur_verricht t1 (cost=62392.02..1188102.97 rows=8942863 width=80) Recheck Cond: (date_part('year'::text, datum) > 2004::double precision) -> Bitmap Index Scan on src_faktuur_verricht_idx1 (cost=0.00..62392.02 rows=8942863 width=0) Index Cond: (date_part('year'::text, datum) > 2004::double precision) The db server runs PostgreSQL 8.1.4 on FreeBSD 6.1-Stable. 2GB of RAM. It contains two SATA150 disks, one contains PostgreSQL and the rest of the operating system and the other disk holds the pg_xlog directory. Changed lines from my postgresql.conf file shared_buffers = 8192 temp_buffers = 4096 work_mem = 65536 maintenance_work_mem = 1048576 max_fsm_pages = 4 fsync = off wal_buffers = 64 effective_cache_size = 174848 The query above takes around 42 minutes. However, i also have a wimpy desktop machine with 1gb ram. Windows with MSSQL 2000 (default installation), same database structure, same indexes, same query, etc and it takes 17 minutes. The big difference makes me think that i've made an error with my PostgreSQL configuration. I just can't seem to figure it out. Could someone perhaps give me some pointers, advice? Thanks in advance. Nicky
Re: [PERFORM] Speeding up query, Joining 55mil and 43mil records.
=8761024 loops=1) -> Bitmap Heap Scan on src_faktuur_verricht t1 (cost=62392.02..1188102.97 rows=8942863 width=14) (actual time=74713.092..216206.478 rows=8761024 loops=1) Recheck Cond: (date_part('year'::text, datum) > 2004::double precision) -> Bitmap Index Scan on src_faktuur_verricht_idx1 (cost=0.00..62392.02 rows=8942863 width=0) (actual time=73892.153..73892.153 rows=8761024 loops=1) Index Cond: (date_part('year'::text, datum) > 2004::double precision) Total runtime: 631994.172 ms A lot of improvement also in the select count: 33 minutes vs 10 minutes. To us, the speeds are good. Very happy with the performance increase on that select with join, since 90% of the queries are SELECT based. The query results in 7551616 records, so that's about 4500 inserts per second. I'm not sure if that is fast or not. Any further tips would be welcome. Thanks everyone. Nicky ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Speeding up query, Joining 55mil and 43mil records.
Hello Sven, We have the following indexes on src_faktuur_verrsec / CREATE INDEX src_faktuur_verrsec_idx0 ON src.src_faktuur_verrsec USING btree (id); CREATE INDEX src_faktuur_verrsec_idx1 ON src.src_faktuur_verrsec USING btree (substr(code::text, 1, 2)); CREATE INDEX src_faktuur_verrsec_idx2 ON src.src_faktuur_verrsec USING btree (substr(correctie::text, 4, 1));/ and another two on src_faktuur_verricht /CREATE INDEX src_faktuur_verricht_idx0 ON src.src_faktuur_verricht USING btree (id); CREATE INDEX src_faktuur_verricht_idx1 ON src.src_faktuur_verricht USING btree (date_part('year'::text, datum)) TABLESPACE src_index;/ PostgreSQL elects not to use them. I assume, because it most likely needs to traverse the entire table anyway. if i change: / substr(t0.code,1,2) not in ('14','15','16','17')/ to (removing the NOT): /substr(t0.code,1,2) in ('14','15','16','17')/ it uses the index, but it's not the query that needs to be run anymore. Greetings, Nick Sven Geisler wrote: Hi Nicky, Did you tried to create an index to avoid the sequential scans? Seq Scan on src_faktuur_verrsec t0... I think, you should try CREATE INDEX src.src_faktuur_verrsec_codesubstr ON src.src_faktuur_verrsec (substr(src.src_faktuur_verrsec.code,1,2)) Cheers Sven. nicky schrieb: Hello again, thanks for all the quick replies. It seems i wasn't entirely correct on my previous post, i've mixed up some times/numbers. Below the correct numbers MSSQL: SELECT COUNT(*) from JOIN (without insert) 17 minutes PostgreSQL: SELECT COUNT(*) from JOIN (without insert) 33 minutes PostgreSQL: complete query 55 minutes A lot of improvement also in the select count: 33 minutes vs 10 minutes. To us, the speeds are good. Very happy with the performance increase on that select with join, since 90% of the queries are SELECT based. The query results in 7551616 records, so that's about 4500 inserts per second. I'm not sure if that is fast or not. Any further tips would be welcome. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Out of Memory Problem.
Hello Everyone, I'm trying to find out/understand what causes my 'out of memory' error. I do not have enough experience with such logs to understand what is wrong or how to fix it. So i hope someone can point me in the right direction. The 'rpt.rpt_verrichting' table contains about 8.5 million records and the 'rpt.rpt_dbc_traject' table contains 700k records. It's part of a nightly process, so there is only 1 user active. The server PostgreSQL 8.1.4 is running on, has 4GB Ram, OS FreeBSD 6.1-Stable. postgresql.conf shared_buffers = 8192 work_mem = 524288 maintenance_work_mem = 524288 effective_cache_size = 104858 Resource limits (current): cputime infinity secs filesize infinity kB datasize 1048576 kB < could this be a problem? stacksize 131072 kB < could this be a problem? coredumpsize infinity kB memoryuseinfinity kB memorylocked infinity kB maxprocesses 5547 openfiles 11095 sbsize infinity bytes vmemoryuse infinity kB Thanks in advance. _*The Query that is causing the out of memory error.*_ LOG: statement: insert into rpt.rpt_verrichting_dbc ( verrichting_id , verrichting_secid , dbcnr , vc_dbcnr ) select t1.verrichting_id , t1.verrichting_secid , t1.dbcnr , max(t1.vc_dbcnr) as vc_dbcnr from rpt.rpt_verrichting t1 , rpt.rpt_dbc_traject t00 where t1.vc_patientnr = t00.vc_patientnr and t1.vc_agb_specialisme_nr_toek = t00.agb_specialisme_nr and t1.verrichtingsdatum between t00.begindat_dbc and COALESCE(t00.einddat_dbc, t00.begindat_dbc + interval '365 days') group by t1.verrichting_id , t1.verrichting_secid , t1.dbcnr ; _*An EXPLAIN for the query:*_ Subquery Scan "*SELECT*" (cost=1837154.04..1839811.72 rows=106307 width=74) -> HashAggregate (cost=1837154.04..1838482.88 rows=106307 width=56) -> Merge Join (cost=1668759.55..1836090.97 rows=106307 width=56) Merge Cond: ((("outer".vc_patientnr)::text = "inner"."?column8?") AND ("outer".agb_specialisme_nr = "inner".vc_agb_specialisme_nr_toek)) Join Filter: (("inner".verrichtingsdatum >= "outer".begindat_dbc) AND ("inner".verrichtingsdatum <= COALESCE("outer".einddat_dbc, ("outer".begindat_dbc + '365 days'::interval -> Index Scan using rpt_dbc_traject_idx1 on rpt_dbc_traject t00 (cost=0.00..84556.01 rows=578274 width=37) -> Sort (cost=1668759.55..1689806.46 rows=8418765 width=79) Sort Key: (t1.vc_patientnr)::text, t1.vc_agb_specialisme_nr_toek -> Seq Scan on rpt_verrichting t1 (cost=0.00..302720.65 rows=8418765 width=79) _*Out of memory log.*_ TopMemoryContext: 16384 total in 2 blocks; 3824 free (4 chunks); 12560 used Type information cache: 8192 total in 1 blocks; 1864 free (0 chunks); 6328 used Operator class cache: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used TopTransactionContext: 8192 total in 1 blocks; 7856 free (0 chunks); 336 used MessageContext: 122880 total in 4 blocks; 64568 free (4 chunks); 58312 used smgr relation table: 8192 total in 1 blocks; 2872 free (0 chunks); 5320 used Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used PortalHeapMemory: 1024 total in 1 blocks; 896 free (0 chunks); 128 used ExecutorState: 8192 total in 1 blocks; 5304 free (1 chunks); 2888 used ExecutorState: 562316108 total in 94 blocks; 528452720 free (2593154 chunks); 33863388 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used AggContext: 399499264 total in 58 blocks; 5928 free (110 chunks); 399493336 used TupleHashTable: 109109272 total in 23 blocks; 2468576 free (70 chunks); 106640696 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks); 4816 used CacheMemoryContext: 516096 total in 6 blocks; 83448 free (0 chunks); 432648 used rpt_dbc_traject_idx1: 1024 total in 1 blocks; 328 free (0 chunks); 696 used rpt_dbc_traject_pk: 1024 total in 1 blocks; 392 free (0 chunks); 632 used rpt_verrichting_idx2: 1024 total in 1 blocks; 392 free (0 chunks); 632 used rpt_verrichting_idx1: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_index_indrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 328 free
Re: [PERFORM] Opteron vs. Xeon "benchmark"
Hello Hannes, The text above the pictures on page 13. Translated in my crappy english. The confrontation between the Opteron and Woodcrest was inevitable in this article, but who can add 1 and 1 should have known from the previous two pages that it doesn't look that good for AMD . Under loads of 25 till 100 simultaneous visitors, the Xeon performs 24% better with MSQL 4.1.20, 30% better in MySQL 5.0.20a and 37% better in PostgreSQL 8.2-dev. In short, the Socket F Opteron doesn't stand a chance, although the Woodcrest scales better and has such a high startpoint with one core, there is no chance of beating it. We can imagine that the Opteron with more memory and production hardware, would be a few % faster, but the difference with the Woodcrest is that high that we have a hard time believing that the complete picture would change that much. Regards, Nick Hannes Dorbath wrote: A colleague pointed me to this site tomorrow: http://tweakers.net/reviews/642/13 I can't read the language, so can't get a grip on what exactly the "benchmark" was about. Their diagrams show `Request per seconds'. What should that mean? How many connections PG accepted per second? So they measured the OS fork performance? Should that value be of any interrest? Anyone with heavy OLTP workload will use persistent connections or a connection pool in front. Do they mean TPS? That woulnd't make much sense in a CPU benchmark, as OLTP workload is typically limited by the disc subsystem. Can someone enlighten me what this site is about? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings