Re: [PERFORM] performance libpq vs JDBC
Can you trying writing libpq program using COPY functions? I hope it will be better than prepared statements. Best Regards, Divakar From: Werner Scholtes werner.schol...@heuboe.de To: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Wed, December 15, 2010 8:21:55 PM Subject: [PERFORM] performance libpq vs JDBC I wrote a test program in C++ using libpq. It works as follows (pseudo code): for( int loop = 0; loop 1000; ++loop ) { PQexec(BEGIN); const char* sql = INSERT INTO pg_perf_test (id, text) VALUES($1,$2); PQprepare(m_conn,stmtid,sql,0,NULL); for ( int i = 0; i 1000; ++i ) // Set values etc. PQexecPrepared(m_conn,…); } PQexec(DEALLOCATE stmtid); PQexec(COMMIT); } I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert) After that, I wrote a test program in Java using JDBC. It works as follows: for( intloops = 0; loops 1000; ++i) { String sql = INSERT INTO pq_perf_test (id,text) VALUES (?,?); PreparedStatement stmt = con.prepareStatement(sql); for(inti = 0; i 1000; ++i ) { // Set values etc. stmt.addBatch(); } stmt.executeBatch(); con.commit(); stmt.close(); } I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert) This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq. Comparable results have been measured with analog update and delete statements. I need to enhance the performance of my C++ code. Is there any possibility in libpq to reach the performance of JDBC for INSERT, UPDATE and DELETE statements (I have no chance to use COPY statements)? I didn't find anything comparable to PreparedStatement.executeBatch() in libpq. Best regards, Werner Scholtes
Re: [PERFORM] performance libpq vs JDBC
Unfortunately I cannot use COPY funtion, since I need the performance of JDBC for update and delete statements in C++ libpq-program as well. I wonder how JDBC PreparedStatement.addBatch() and PreparedStatement.executeBatch() work. They need to have a more efficient protocol to send bulks of parameter sets for one prepared statement as batch in one network transmission to the server. As far as I could see PQexecPrepared does not allow to send more than one parameter set (parameters for one row) in one call. So libpq sends 1000 times one single row to the server where JDBC sends 1 time 1000 rows, which is much more efficient. I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right? Von: Divakar Singh [mailto:dpsma...@yahoo.com] Gesendet: Donnerstag, 16. Dezember 2010 09:11 An: Werner Scholtes; pgsql-performance@postgresql.org Betreff: Re: [PERFORM] performance libpq vs JDBC Can you trying writing libpq program using COPY functions? I hope it will be better than prepared statements. Best Regards, Divakar From: Werner Scholtes werner.schol...@heuboe.de To: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Wed, December 15, 2010 8:21:55 PM Subject: [PERFORM] performance libpq vs JDBC I wrote a test program in C++ using libpq. It works as follows (pseudo code): for ( int loop = 0; loop 1000; ++loop ) { PQexec(BEGIN); const char* sql = INSERT INTO pg_perf_test (id, text) VALUES($1,$2); PQprepare(m_conn, stmtid,sql,0,NULL); for ( int i = 0; i 1000; ++i ) // Set values etc. PQexecPrepared(m_conn,…); } PQexec(DEALLOCATE stmtid); PQexec(COMMIT); } I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert) After that, I wrote a test program in Java using JDBC. It works as follows: for ( int loops = 0; loops 1000; ++i) { String sql = INSERT INTO pq_perf_test (id,text) VALUES (?,?); PreparedStatement stmt = con.prepareStatement(sql); for (int i = 0; i 1000; ++i ) { // Set values etc. stmt.addBatch(); } stmt.executeBatch(); con.commit(); stmt.close(); } I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert) This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq. Comparable results have been measured with analog update and delete statements. I need to enhance the performance of my C++ code. Is there any possibility in libpq to reach the performance of JDBC for INSERT, UPDATE and DELETE statements (I have no chance to use COPY statements)? I didn't find anything comparable to PreparedStatement.executeBatch() in libpq. Best regards, Werner Scholtes
Re: [PERFORM] performance libpq vs JDBC
If you have all records before issuing Insert, you can do it like: insert into xxx values (a,b,c), (d,e,f), ..; an example: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows Best Regards, Divakar From: Werner Scholtes werner.schol...@heuboe.de To: Divakar Singh dpsma...@yahoo.com; pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Thu, December 16, 2010 2:51:53 PM Subject: RE: [PERFORM] performance libpq vs JDBC Unfortunately I cannot use COPY funtion, since I need the performance of JDBC for update and delete statements in C++ libpq-program as well. I wonder how JDBC PreparedStatement.addBatch() and PreparedStatement.executeBatch() work. They need to have a more efficient protocol to send bulks of parameter sets for one prepared statement as batch in one network transmission to the server. As far as I could see PQexecPrepared does not allow to send more than one parameter set (parameters for one row) in one call. So libpq sends 1000 times one single row to the server where JDBC sends 1 time 1000 rows, which is much more efficient. I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right? Von:Divakar Singh [mailto:dpsma...@yahoo.com] Gesendet: Donnerstag, 16. Dezember 2010 09:11 An: Werner Scholtes; pgsql-performance@postgresql.org Betreff: Re: [PERFORM] performance libpq vs JDBC Can you trying writing libpq program using COPY functions? I hope it will be better than prepared statements. Best Regards, Divakar From:Werner Scholtes werner.schol...@heuboe.de To: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Wed, December 15, 2010 8:21:55 PM Subject: [PERFORM] performance libpq vs JDBC I wrote a test program in C++ using libpq. It works as follows (pseudo code): for( int loop = 0; loop 1000; ++loop ) { PQexec(BEGIN); const char* sql = INSERT INTO pg_perf_test (id, text) VALUES($1,$2); PQprepare(m_conn,stmtid,sql,0,NULL); for ( int i = 0; i 1000; ++i ) // Set values etc. PQexecPrepared(m_conn,…); } PQexec(DEALLOCATE stmtid); PQexec(COMMIT); } I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert) After that, I wrote a test program in Java using JDBC. It works as follows: for( intloops = 0; loops 1000; ++i) { String sql = INSERT INTO pq_perf_test (id,text) VALUES (?,?); PreparedStatement stmt = con.prepareStatement(sql); for(inti = 0; i 1000; ++i ) { // Set values etc. stmt.addBatch(); } stmt.executeBatch(); con.commit(); stmt.close(); } I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert) This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq. Comparable results have been measured with analog update and delete statements. I need to enhance the performance of my C++ code. Is there any possibility in libpq to reach the performance of JDBC for INSERT, UPDATE and DELETE statements (I have no chance to use COPY statements)? I didn't find anything comparable to PreparedStatement.executeBatch() in libpq. Best regards, Werner Scholtes
Re: [PERFORM] performance libpq vs JDBC
What about update and delete? In case of an update I have all records to be updated and in case of an delete I have all primary key values of records to be deleted. Von: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] Im Auftrag von Divakar Singh Gesendet: Donnerstag, 16. Dezember 2010 10:38 An: Werner Scholtes; pgsql-performance@postgresql.org Betreff: Re: [PERFORM] performance libpq vs JDBC If you have all records before issuing Insert, you can do it like: insert into xxx values (a,b,c), (d,e,f), ..; an example: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows Best Regards, Divakar From: Werner Scholtes werner.schol...@heuboe.de To: Divakar Singh dpsma...@yahoo.com; pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Thu, December 16, 2010 2:51:53 PM Subject: RE: [PERFORM] performance libpq vs JDBC Unfortunately I cannot use COPY funtion, since I need the performance of JDBC for update and delete statements in C++ libpq-program as well. I wonder how JDBC PreparedStatement.addBatch() and PreparedStatement.executeBatch() work. They need to have a more efficient protocol to send bulks of parameter sets for one prepared statement as batch in one network transmission to the server. As far as I could see PQexecPrepared does not allow to send more than one parameter set (parameters for one row) in one call. So libpq sends 1000 times one single row to the server where JDBC sends 1 time 1000 rows, which is much more efficient. I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right? Von: Divakar Singh [mailto:dpsma...@yahoo.com] Gesendet: Donnerstag, 16. Dezember 2010 09:11 An: Werner Scholtes; pgsql-performance@postgresql.org Betreff: Re: [PERFORM] performance libpq vs JDBC Can you trying writing libpq program using COPY functions? I hope it will be better than prepared statements. Best Regards, Divakar From: Werner Scholtes werner.schol...@heuboe.de To: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Wed, December 15, 2010 8:21:55 PM Subject: [PERFORM] performance libpq vs JDBC I wrote a test program in C++ using libpq. It works as follows (pseudo code): for ( int loop = 0; loop 1000; ++loop ) { PQexec(BEGIN); const char* sql = INSERT INTO pg_perf_test (id, text) VALUES($1,$2); PQprepare(m_conn, stmtid,sql,0,NULL); for ( int i = 0; i 1000; ++i ) // Set values etc. PQexecPrepared(m_conn,…); } PQexec(DEALLOCATE stmtid); PQexec(COMMIT); } I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert) After that, I wrote a test program in Java using JDBC. It works as follows: for ( int loops = 0; loops 1000; ++i) { String sql = INSERT INTO pq_perf_test (id,text) VALUES (?,?); PreparedStatement stmt = con.prepareStatement(sql); for (int i = 0; i 1000; ++i ) { // Set values etc. stmt.addBatch(); } stmt.executeBatch(); con.commit(); stmt.close(); } I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert) This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq. Comparable results have been measured with analog update and delete statements. I need to enhance the performance of my C++ code. Is there any possibility in libpq to reach the performance of JDBC for INSERT, UPDATE and DELETE statements (I have no chance to use COPY statements)? I didn't find anything comparable to PreparedStatement.executeBatch() in libpq. Best regards, Werner Scholtes
Re: [PERFORM] performance libpq vs JDBC
Update and delete are the operations which affect more than 1 row in general. The only thing is that the criteria has to be the same for all rows. If you have different criteria for different rows in case of update or delete, you will have to fire 2 queries. I mean, if you want to do 1. delete from xyz where a = 1 and 2. delete from xyz where a = 2 Then you will have to run query 2 times. Best Regards, Divakar From: Werner Scholtes werner.schol...@heuboe.de To: Divakar Singh dpsma...@yahoo.com; pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Thu, December 16, 2010 3:11:36 PM Subject: Re: [PERFORM] performance libpq vs JDBC What about update and delete? In case of an update I have all records to be updated and in case of an delete I have all primary key values of records to be deleted. Von:pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] Im Auftrag von Divakar Singh Gesendet: Donnerstag, 16. Dezember 2010 10:38 An: Werner Scholtes; pgsql-performance@postgresql.org Betreff: Re: [PERFORM] performance libpq vs JDBC If you have all records before issuing Insert, you can do it like: insert into xxx values (a,b,c), (d,e,f), ..; an example: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows Best Regards, Divakar From:Werner Scholtes werner.schol...@heuboe.de To: Divakar Singh dpsma...@yahoo.com; pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Thu, December 16, 2010 2:51:53 PM Subject: RE: [PERFORM] performance libpq vs JDBC Unfortunately I cannot use COPY funtion, since I need the performance of JDBC for update and delete statements in C++ libpq-program as well. I wonder how JDBC PreparedStatement.addBatch() and PreparedStatement.executeBatch() work. They need to have a more efficient protocol to send bulks of parameter sets for one prepared statement as batch in one network transmission to the server. As far as I could see PQexecPrepared does not allow to send more than one parameter set (parameters for one row) in one call. So libpq sends 1000 times one single row to the server where JDBC sends 1 time 1000 rows, which is much more efficient. I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right? Von:Divakar Singh [mailto:dpsma...@yahoo.com] Gesendet: Donnerstag, 16. Dezember 2010 09:11 An: Werner Scholtes; pgsql-performance@postgresql.org Betreff: Re: [PERFORM] performance libpq vs JDBC Can you trying writing libpq program using COPY functions? I hope it will be better than prepared statements. Best Regards, Divakar From:Werner Scholtes werner.schol...@heuboe.de To: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Wed, December 15, 2010 8:21:55 PM Subject: [PERFORM] performance libpq vs JDBC I wrote a test program in C++ using libpq. It works as follows (pseudo code): for( int loop = 0; loop 1000; ++loop ) { PQexec(BEGIN); const char* sql = INSERT INTO pg_perf_test (id, text) VALUES($1,$2); PQprepare(m_conn,stmtid,sql,0,NULL); for ( int i = 0; i 1000; ++i ) // Set values etc. PQexecPrepared(m_conn,…); } PQexec(DEALLOCATE stmtid); PQexec(COMMIT); } I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert) After that, I wrote a test program in Java using JDBC. It works as follows: for( intloops = 0; loops 1000; ++i) { String sql = INSERT INTO pq_perf_test (id,text) VALUES (?,?); PreparedStatement stmt = con.prepareStatement(sql); for(inti = 0; i 1000; ++i ) { // Set values etc. stmt.addBatch(); } stmt.executeBatch(); con.commit(); stmt.close(); } I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert) This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq. Comparable results have been measured with analog update and delete statements. I need to enhance the performance of my C++ code. Is there any possibility in libpq to reach the performance of JDBC for INSERT, UPDATE and DELETE statements (I have no chance to use COPY statements)? I didn't find anything comparable to PreparedStatement.executeBatch() in libpq. Best regards, Werner Scholtes
Re: [PERFORM] performance libpq vs JDBC
On 16/12/10 09:21, Werner Scholtes wrote: I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right? Sounds wrong to me. The libpq client is the default reference implementation of the protocol. If there were large efficiencies that could be copied, they would be. Anyway - you don't need to assume what's in the protocol. It's documented here: http://www.postgresql.org/docs/9.0/static/protocol.html I'd stick wireshark or some other network analyser on the two sessions - see exactly what is different. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] How to get FK to use new index without restarting the database
Is there a way force the db to re-evaluate its execution plan for a FK without bouncing the DB? PostgreSQL 8.1.17 In our latest release our developers have implemented some new foreign keys but forgot to create indexes on these keys. The problem surfaced at one of our client installs where a maintenance DELETE query was running for over 24 hrs. We have since then identified the missing indexes and have sent the client a script to create them, but in our testing we could not been able to get postgres to use the new index for the FK cascade delete without bouncing the database. Here is an example of an added fk but missing index ALTER TABLE scheduled_job_arg ADD CONSTRAINT sjr_scheduled_job_id_fk FOREIGN KEY (scheduled_job_id) REFERENCES scheduled_job (id) ON UPDATE CASCADE ON DELETE CASCADE; Thanks in Advance, Eric -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance libpq vs JDBC
Thanks a lot for your advice. I found the difference: My Java program sends one huge SQL string containing 1000 INSERT statements separated by ';' (without using prepared statements at all!), whereas my C++ program sends one INSERT statement with parameters to be prepared and after that 1000 times parameters. Now I refactured my C++ program to send also 1000 INSERT statements in one call to PQexec and reached the same performance as my Java program. I just wonder why anyone should use prepared statements at all? -Ursprüngliche Nachricht- Von: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- ow...@postgresql.org] Im Auftrag von Richard Huxton Gesendet: Donnerstag, 16. Dezember 2010 13:15 An: Werner Scholtes Cc: Divakar Singh; pgsql-performance@postgresql.org Betreff: Re: [PERFORM] performance libpq vs JDBC On 16/12/10 09:21, Werner Scholtes wrote: I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right? Sounds wrong to me. The libpq client is the default reference implementation of the protocol. If there were large efficiencies that could be copied, they would be. Anyway - you don't need to assume what's in the protocol. It's documented here: http://www.postgresql.org/docs/9.0/static/protocol.html I'd stick wireshark or some other network analyser on the two sessions - see exactly what is different. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql- performa...@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to get FK to use new index without restarting the database
Hello, Is there a way force the db to re-evaluate its execution plan for a FK without bouncing the DB? PostgreSQL 8.1.17 In our latest release our developers have implemented some new foreign keys but forgot to create indexes on these keys. The problem surfaced at one of our client installs where a maintenance DELETE query was running for over 24 hrs. We have since then identified the missing indexes and have sent the client a script to create them, but in our testing we could not been able to get postgres to use the new index for the FK cascade delete without bouncing the database. Did you try analyze? May be it will help. http://www.postgresql.org/docs/9.0/static/sql-analyze.html Regards, Jayadevan DISCLAIMER: The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance libpq vs JDBC
On 16/12/10 12:28, Werner Scholtes wrote: Thanks a lot for your advice. I found the difference: My Java program sends one huge SQL string containing 1000 INSERT statements separated by ';' (without using prepared statements at all!), whereas my C++ program sends one INSERT statement with parameters to be prepared and after that 1000 times parameters. Now I refactured my C++ program to send also 1000 INSERT statements in one call to PQexec and reached the same performance as my Java program. So - it was the network round-trip overhead. Like Divakar suggested, COPY or VALUES (),(),() would work too. You mention multiple updates/deletes too. Perhaps the cleanest and fastest method would be to build a TEMP table containing IDs/values required and join against that for your updates/deletes. I just wonder why anyone should use prepared statements at all? Not everything is a simple INSERT. Preparing saves planning-time on repeated SELECTs. It also provides some SQL injection safety since you provide parameters rather than building a SQL string. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to get FK to use new index without restarting the database
On 16/12/10 12:12, Eric Comeau wrote: The problem surfaced at one of our client installs where a maintenance DELETE query was running for over 24 hrs. We have since then identified the missing indexes and have sent the client a script to create them, but in our testing we could not been able to get postgres to use the new index for the FK cascade delete without bouncing the database. Well, an ongoing DELETE isn't going to see a new index. I'd have thought a new connection should though. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to get FK to use new index without restarting the database
On 10-12-16 07:34 AM, Jayadevan M wrote: Hello, Is there a way force the db to re-evaluate its execution plan for a FK without bouncing the DB? PostgreSQL 8.1.17 In our latest release our developers have implemented some new foreign keys but forgot to create indexes on these keys. The problem surfaced at one of our client installs where a maintenance DELETE query was running for over 24 hrs. We have since then identified the missing indexes and have sent the client a script to create them, but in our testing we could not been able to get postgres to use the new index for the FK cascade delete without bouncing the database. Did you try analyze? May be it will help. http://www.postgresql.org/docs/9.0/static/sql-analyze.html Yes we did. Thanks for the suggestion. Regards, Jayadevan DISCLAIMER: The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help with bulk read performance
Hello Daniel, We have the same scenario for the native Java arrays, so we are storing bytea and doing conversion at the client side, but for the server side SQL, plJava comes very handy: No sure how you want to create stored procedures to convert internally but this is how we do this: One has to define conversion routines in Java then deploy them to plJava. Scanning though this field would be still CPU bound, around 2x slower than with native arrays and 6x slower than with blobs, but at least one has this ability. It's even possible to pass them to plR to do some statistical processing directly, so depending on the operations you do it may be still cheaper then streaming out over the wire to the regular JDBC client. 1. deploy class like this within plJava (null handling left out for brevity) import java.io.File; import java.io.IOException; import java.io.ObjectInputStream; import java.io.ObjectOutputStream; public class myArrayConversion { public myArrayConversion() {} /** Serialize double array to blob */ public static byte[] convertDoubleArrayToBytea(double[] obj) throws IOException { ByteArrayOutputStream baos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(baos); oos.writeObject(obj); return baos.toByteArray(); } /** Serialize int array to blob */ public static byte[] convertIntToBytea(int[] obj) throws IOException { ByteArrayOutputStream baos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(baos); oos.writeObject(obj); return baos.toByteArray(); } /** Deserialize blob to double array */ public static double[] convertToDoubleArray(byte[] obj) throws IOException, ClassNotFoundException { // Deserialize from a byte array ObjectInputStream ios = new ObjectInputStream(new ByteArrayInputStream(obj)); return (double[])ios.readObject(); } /** Deserialize blob to it array */ public static int[] convertIntToArray(byte[] obj) throws IOException, ClassNotFoundException { // Deserialize from a byte array ObjectInputStream ios = new ObjectInputStream(new ByteArrayInputStream(obj)); return (int[])ios.readObject(); } // other types arrays streaming... //... } 2. then create a mapping functions as a db owner: sql CREATE OR REPLACE FUNCTION public.convertDoubleArrayToBytea(double precision[]) RETURNS bytea AS 'mappingPkg.convertDoubleArrayToBytea(double[])' LANGUAGE 'javau' IMMUTABLE COST 50; GRANT EXECUTE ON FUNCTION public.convertDoubleArrayToBytea(double precision[]) TO public; CREATE OR REPLACE FUNCTION public.convertToDoubleArray(bytea) RETURNS double precision[] AS 'mappingPkg.convertToDoubleArray(byte[])' LANGUAGE 'javau' IMMUTABLE COST 50; GRANT EXECUTE ON FUNCTION public.convertToDoubleArray(bytea) TO public; /sql then you can have conversion either way: select convertToDoubleArray(convertDoubleArrayToBytea(array[i::float8,1.1,100.1,i*0.1]::float8[])) from generate_series(1,100) i; so you'd be also able to create bytea objects from native SQL arrays within SQL. PLJava seems to be enjoying revival last days thanks to Johann 'Myrkraverk' Oskarsson who fixed several long-standing bugs. Check out the plJava list for details. Krzysztof On Dec 16, 2010, at 10:22 AM, pgsql-performance-ow...@postgresql.org wrote: From: Dan Schaffer daniel.s.schaf...@noaa.gov Date: December 15, 2010 9:15:14 PM GMT+01:00 To: Andy Colson a...@squeakycode.net Cc: Jim Nasby j...@nasby.net, pgsql-performance@postgresql.org, Nick Matheson nick.d.mathe...@noaa.gov Subject: Re: Help with bulk read performance Reply-To: daniel.s.schaf...@noaa.gov Hi, My name is Dan and I'm a co-worker of Nick Matheson who initially submitted this question (because the mail group had me blacklisted for awhile for some reason). Thank you for all of the suggestions. We were able to improve out bulk read performance from 3 MB/s to 60 MB/s (assuming the data are NOT in cache in both cases) by doing the following: 1. Storing the data in a bytea column instead of an array column. 2. Retrieving the data via the Postgres 9 CopyManager#copyOut(String sql, OutputStream stream) method The key to the dramatic improvement appears to be the reduction in packing and unpacking time on the server and client, respectively. The server packing occurs when the retrieved data are packed into a bytestream for sending across the network. Storing the data as a simple byte array reduces this time substantially. The client-side unpacking time is spent generating a ResultSet object. By unpacking the bytestream into the desired arrays of floats by hand instead, this time became close to
Re: [PERFORM] performance libpq vs JDBC
On Thu, Dec 16, 2010 at 7:14 AM, Richard Huxton d...@archonet.com wrote: On 16/12/10 09:21, Werner Scholtes wrote: I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right? Sounds wrong to me. The libpq client is the default reference implementation of the protocol. If there were large efficiencies that could be copied, they would be. Anyway - you don't need to assume what's in the protocol. It's documented here: http://www.postgresql.org/docs/9.0/static/protocol.html I'd stick wireshark or some other network analyser on the two sessions - see exactly what is different. There is only one explanation for the difference: they are slamming data across the wire without waiting for the result. libpq queries are synchronous: you send a query, wait for the result. This means for very simple queries like the above you can become network bound. In C/C++ you can work around this using a couple of different methods. COPY of course is the fastest, but extremely limiting in what it can do. We developed libpqtypes (I love talking about libpqtypes) to deal with this problem. In the attached example, it stacks data into an array in the client, sends it to the server which unnests and inserts it. The attached example inserts a million rows in about 11 seconds on my workstation (client side prepare could knock that down to 8 or so). If you need to do something fancy, the we typically create a receiving function on the server in plpgsql which unnests() the result and makes decisions, etc. This is extremely powerful and you can compose and send very rich data to/from postgres in a single query. merlin #include libpq-fe.h #include libpqtypes.h #define INS_COUNT 100 int main() { int i; PGconn *conn = PQconnectdb(dbname=pg9); PGresult *res; if(PQstatus(conn) != CONNECTION_OK) { printf(bad connection); return -1; } PQtypesRegister(conn); PGregisterType type = {ins_test, NULL, NULL}; PQregisterComposites(conn, type, 1); PGparam *p = PQparamCreate(conn); PGarray arr; arr.param = PQparamCreate(conn); arr.ndims = 0; PGparam *t = PQparamCreate(conn); for(i=0; iINS_COUNT; i++) { PGint4 a=i; PGtext b = some_text; PGtimestamp c; PGbytea d; d.len = 8; d.data = b; c.date.isbc = 0; c.date.year = 2000; c.date.mon= 0; c.date.mday = 19; c.time.hour = 10; c.time.min= 41; c.time.sec= 6; c.time.usec = 0; c.time.gmtoff = -18000; PQputf(t, %int4 %text %timestamptz %bytea, a, b, c, d); PQputf(arr.param, %ins_test, t); PQparamReset(t); } if(!PQputf(p, %ins_test[], arr)) { printf(putf failed: %s\n, PQgeterror()); return -1; } res = PQparamExec(conn, p, insert into ins_test select * from unnest($1) r(a, b, c, d), 1); if(!res) { printf(got %s\n, PQgeterror()); return -1; } PQclear(res); PQparamClear(p); PQfinish(conn); } -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help with bulk read performance
If the data are stored as a byte array but retrieve into a ResultSet, the unpacking time goes up by an order of magnitude and the observed total throughput is 25 MB/s. If the data are stored in a Postgres float array and unpacked into a byte stream, the observed throughput is 20 MB/s. float - text conversions are very slow, this is in fact due to the mismatch between base-2 (IEEE754) and base-10 (text) floating point representation, which needs very very complex calculations. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to get FK to use new index without restarting the database
Eric Comeau ecom...@signiant.com writes: Is there a way force the db to re-evaluate its execution plan for a FK without bouncing the DB? PostgreSQL 8.1.17 You don't need to bounce the whole DB, but you will need to start fresh sessions. We didn't add automatic invalidation of those plans until 8.3. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index Bloat - how to tell?
Dave Crooke wrote: There is a plugin called pgstattuple which can be quite informative however, it actually does a full scan of the table / index files, which may be a bit invasive depending on your environment and load. http://www.postgresql.org/docs/current/static/pgstattuple.html It's in the contrib (at least for 8.4), and so you have to import its functions into your schema using the script in the contrib directory. Cheers Dave I tried it with one of my databases: testtrack=# select * from pgstatindex('public.defects_pkey'); version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation -+++---+++-+---+--+--- - 2 | 1 | 827392 | 3 | 0 |100 | 0 | 0 |70.12 | 22 (1 row) What is leaf_fragmentation? How is it defined? I wasn't able to find out any definition of that number. How is it calculated. I verified that running reindex makes it 0: testtrack=# reindex table public.defects; REINDEX testtrack=# select * from pgstatindex('public.defects_pkey'); version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation -+++---+++-+---+--+--- - 2 | 1 | 647168 | 3 | 0 | 78 | 0 | 0 |89.67 | 0 (1 row) -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to get FK to use new index without restarting the database
On 10-12-16 11:27 AM, Tom Lane wrote: Eric Comeauecom...@signiant.com writes: Is there a way force the db to re-evaluate its execution plan for a FK without bouncing the DB? PostgreSQL 8.1.17 You don't need to bounce the whole DB, but you will need to start fresh sessions. We didn't add automatic invalidation of those plans until 8.3. regards, tom lane We confirmed that disconnecting and reconnecting resolves the issue. Thanks to all that helped. I replied to Tom and the list yesterday from my e-mail, but I don't see my reply here, so it must be stuck in the ether somewhere Eric -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance