Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set
On Wed, 21 Apr 2010, Robert Haas wrote: On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka bo...@ejurka.com wrote: b) Using the parameter values for statistics, but not making any stronger guarantees about them. So the parameters will be used for evaluating the selectivity, but not to perform other optimizations like contraint_exclusion or transforming a LIKE operation to a range query. This is the default plan type the JDBC driver uses. Hmm. I didn't think this was possible. How are you doing this? This is only possible at the protocol level, it's not available using SQL commands only. You do this by creating an unnamed instead of a named statement: http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY Query planning for named prepared-statement objects occurs when the Parse message is processed. If a query will be repeatedly executed with different parameters, it might be beneficial to send a single Parse message containing a parameterized query, followed by multiple Bind and Execute messages. This will avoid replanning the query on each execution. The unnamed prepared statement is likewise planned during Parse processing if the Parse message defines no parameters. But if there are parameters, query planning occurs during Bind processing instead. This allows the planner to make use of the actual values of the parameters provided in the Bind message when planning the query. Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set
On Wed, 21 Apr 2010, Nikolas Everett wrote: More to the point is there some option that can shift you into method a? I'm thinking of warehousing type applications where you want to re-plan a good portion of your queries. This can be done by connecting to the database using the V2 protocol (use URL option protocolVersion=2). This does remove some functionality of the driver that is only available for V3 protocol, but will work just fine for query execution. Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set
On Mon, 19 Apr 2010, Dave Crooke wrote: Statement.close() appears to get the job done (in my envrionment, PG's driver never sees a Connection.close() because of DBCP). I'd consider the fact that ResultSet.close() does not release the implicit cursor to be something of a bug, but it may well have been fixed already. PG doesn't release the locks acquired by the query until transaction end. So closing a cursor will release some backend memory, but it won't release the locks. The way the driver implements ResultSet.close() is to put the close message into a queue so that the next time a message is sent to the backend we'll also send the cursor close message. This avoids an extra network roundtrip for the close action. In any case Statement.close isn't helping you here either. It's really Connection.commit/rollback that's releasing the locks. Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very high effective_cache_size == worse performance?
On Tue, 20 Apr 2010, Nikolas Everett wrote: You can absolutely use copy if you like but you need to use a non-standard jdbc driver: kato.iki.fi/sw/db/postgresql/jdbc/copy/. I've used it in the past and it worked. Copy support has been added to the 8.4 driver. Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set
On Tue, 20 Apr 2010, Dave Crooke wrote: a. Make setFetchSize(1) the default The reason this is not done is that the mechanism used for fetching a piece of the results at a time can change the query plan used if using a PreparedStatement. There are three ways to plan a PreparedStatement: a) Using the exact parameter values by substituting them directly into the query. This isn't really planned as you can't re-use it at all. This is only available using the V2 protocol. b) Using the parameter values for statistics, but not making any stronger guarantees about them. So the parameters will be used for evaluating the selectivity, but not to perform other optimizations like contraint_exclusion or transforming a LIKE operation to a range query. This is the default plan type the JDBC driver uses. c) Planning the query with no regard for the parameters passed to it. This is the plan type the JDBC driver uses when it sees the same PreparedStatement being re-used multiple times or when it is respecting setFetchSize and allowing for partial results. We must use (c) for partial results instead of (b) because of some limitations of the server. Currently you cannot have two statements of type (b) open on the same connection. So since the driver can't know if the user will issue another query before fetching the remainder of the first query's results, it must setup the first query to be of type (c) so that multiple statements can exist simultaneously. Switching the default plan type to (c) will cause a significant number of complaints as performance on some queries will go into the tank. Perhaps we could have a default fetchSize for plain Statements as it won't affect the plan. I could also see making this a URL parameter though so it could be set as the default with only a configuration, not a code change. b. If someone does call rs.close() before the end of the ResultSet, and has not created an explicit cursor at the JDBC level, flag the query / lock / virtual transaction in some way in the JDBC driver that tells it that it can just dump the cursor on a subsequent stmt.close(), conn.commit() or conn.close() call without sucking down the rest of the data. This is already true. The JDBC driver only asks the server for more of the ResultSet when a next() call requires it. So the server isn't constantly spewing out rows that the driver must deal with, the driver only gets the rows it asks for. Once the ResultSet is closed, it won't ask for any more. Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query slowing down significantly??
On Mon, 1 Mar 2010, Rainer Pruy wrote: It is a Java app, using jdbc, but through a proprietary persistence framework. I'm just busy evaluating the effects on the app of prohibiting prepared statements via jdbc. If this is not worthwhile, I'm bound to some expensive reorganizations, sigh. You can disable the named statement by adding the parameter prepareThreshold=0 to your connection URL. Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
Scott Marlowe wrote: On Sun, Apr 26, 2009 at 11:07 AM, Kris Jurka bo...@ejurka.com wrote: As a note for non-JDBC users, the JDBC driver's batch interface allows executing multiple statements in a single network roundtrip. This is something you can't get in libpq, so beware of this for comparison's sake.. Really? I thought that executing statements like so: select * from a;insert ...;delete; in psql / libpq would execute them all in one trip. Right, but those aren't prepared. I suppose it's possible to issue a prepare and then issue a batch of comma separated execute statements, but that's not exactly a natural interface. Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
On Thu, 23 Apr 2009, Thomas Kellerer wrote: Out of curiosity I did some tests through JDBC. Using a single-column (integer) table, re-using a prepared statement took about 7 seconds to insert 10 rows with JDBC's batch interface and a batch size of 1000 As a note for non-JDBC users, the JDBC driver's batch interface allows executing multiple statements in a single network roundtrip. This is something you can't get in libpq, so beware of this for comparison's sake. I also played around with batch size. Going beyond 200 didn't make a big difference. Despite the size of the batch passed to the JDBC driver, the driver breaks it up into internal sub-batch sizes of 256 to send to the server. It does this to avoid network deadlocks from sending too much data to the server without reading any in return. If the driver was written differently it could handle this better and send the full batch size, but at the moment that's not possible and we're hoping the gains beyond this size aren't too large. Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] No hash join across partitioned tables?
Tom Lane wrote: Is there another issue here besides that one? I think you were hoping that the hash join would be faster than the alternatives, but the cost estimate says it's a lot slower. Is that actually the case? The hash join takes less than twenty seconds, the other two joins I killed after five minutes. I can try to collect explain analyze results later today if you'd like. Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] No hash join across partitioned tables?
Tom Lane wrote: Kris Jurka bo...@ejurka.com writes: The hash join takes less than twenty seconds, the other two joins I killed after five minutes. I can try to collect explain analyze results later today if you'd like. Attached are the explain analyze results. The analyze part hits the hash join worst of all, so I've also included the timings without analyzing. Method Time (ms) Time w/Analyze (ms) nestloop 304853 319060 merge514517 683757 hash 18957 143731 Kris Jurka Aggregate (cost=116546928.75..116546928.76 rows=1 width=0) (actual time=143731.602..143731.604 rows=1 loops=1) - Hash Join (cost=1470.48..108802907.84 rows=3097608361 width=0) (actual time=308.015..143724.055 rows=2437 loops=1) Hash Cond: ((l.vin)::text = (i.vin)::text) - Append (cost=0.00..332098.75 rows=18450775 width=18) (actual time=0.069..99984.899 rows=18449996 loops=1) - Seq Scan on liens l (cost=0.00..14.00 rows=400 width=21) (actual time=0.003..0.003 rows=0 loops=1) - Seq Scan on liens_s1 l (cost=0.00..18633.44 rows=917444 width=18) (actual time=0.060..1828.740 rows=917444 loops=1) - Seq Scan on liens_s2 l (cost=0.00..20.92 rows=1192 width=18) (actual time=0.010..2.274 rows=1192 loops=1) - Seq Scan on liens_s3 l (cost=0.00..53793.79 rows=2934179 width=18) (actual time=0.054..5777.782 rows=2934179 loops=1) - Seq Scan on liens_s4 l (cost=0.00..21069.39 rows=1214139 width=18) (actual time=0.065..2413.429 rows=1214139 loops=1) - Seq Scan on liens_s5 l (cost=0.00..29966.37 rows=1726837 width=18) (actual time=0.046..3394.974 rows=1726837 loops=1) - Seq Scan on liens_s6 l (cost=0.00..10587.18 rows=462918 width=18) (actual time=0.053..936.379 rows=462918 loops=1) - Seq Scan on liens_s7 l (cost=0.00..14.00 rows=400 width=21) (actual time=0.003..0.003 rows=0 loops=1) - Seq Scan on liens_s8 l (cost=0.00..86004.68 rows=4956168 width=18) (actual time=0.045..9729.965 rows=4956182 loops=1) - Seq Scan on liens_s9 l (cost=0.00..320.29 rows=18429 width=18) (actual time=0.010..34.880 rows=18429 loops=1) - Seq Scan on liens_s10 l (cost=0.00..18398.16 rows=951016 width=18) (actual time=0.055..1889.948 rows=951016 loops=1) - Seq Scan on liens_s11 l (cost=0.00..9956.22 rows=543022 width=18) (actual time=0.055..1070.156 rows=543022 loops=1) - Seq Scan on liens_s12 l (cost=0.00..78813.85 rows=4541785 width=18) (actual time=0.012..9431.035 rows=4541792 loops=1) - Seq Scan on liens_s13 l (cost=0.00..4506.46 rows=182846 width=18) (actual time=0.049..374.788 rows=182846 loops=1) - Hash (cost=1050.77..1050.77 rows=33577 width=18) (actual time=256.374..256.374 rows=33297 loops=1) - Append (cost=0.00..1050.77 rows=33577 width=18) (actual time=0.019..188.152 rows=33297 loops=1) - Seq Scan on impounds i (cost=0.00..11.40 rows=140 width=21) (actual time=0.002..0.002 rows=0 loops=1) - Seq Scan on impounds_s1 i (cost=0.00..11.40 rows=140 width=21) (actual time=0.002..0.002 rows=0 loops=1) - Seq Scan on impounds_s2 i (cost=0.00..913.87 rows=29587 width=18) (actual time=0.008..60.728 rows=29587 loops=1) - Seq Scan on impounds_s3 i (cost=0.00..18.14 rows=414 width=18) (actual time=0.009..0.848 rows=414 loops=1) - Seq Scan on impounds_s4 i (cost=0.00..95.96 rows=3296 width=18) (actual time=0.012..6.894 rows=3296 loops=1) Total runtime: 143731.788 ms (26 rows) QUERY PLAN --- Aggregate (cost=57241210.61..57241210.62 rows=1 width=0) (actual time=683467.350..683467.352 rows=1 loops=1) - Merge Join (cost=2940810.41..49497189.70 rows=3097608361 width=0) (actual time=434026.342..683460.545 rows=2437 loops=1) Merge Cond: ((l.vin)::text = (i.vin)::text) - Sort (cost=2937235.46..2983362.40 rows=18450775 width=18) (actual time=433519.957..637389.755 rows=18449961 loops=1) Sort Key: l.vin Sort Method: external merge Disk: 504728kB - Append (cost=0.00..332098.75 rows=18450775 width=18) (actual time=14.764..102905.170 rows=18449996 loops=1) - Seq Scan on liens l (cost=0.00..14.00 rows=400 width=21) (actual time=0.003..0.003 rows=0 loops=1) - Seq Scan on liens_s1 l (cost=0.00..18633.44 rows=917444 width=18) (actual time=14.755..2167.668 rows=917444 loops=1) - Seq Scan on liens_s2 l (cost=0.00..20.92 rows=1192 width=18) (actual time=0.012..2.304 rows
[PERFORM] No hash join across partitioned tables?
(cost=4073102.86..4303737.81 rows=18450796 width=21) - Sort (cost=4073102.86..4119229.85 rows=18450796 width=21) Sort Key: l.vin - Append (cost=0.00..332797.96 rows=18450796 width=21) - Seq Scan on liens l (cost=0.00..14.00 rows=400 width=21) - [Seq Scans on other partitions] Disabling mergejoin pushes it back to a nestloop join. Why can't it hash join these two together? Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] No hash join across partitioned tables?
On Thu, 16 Apr 2009, Tom Lane wrote: Kris Jurka bo...@ejurka.com writes: PG (8.3.7) doesn't seem to want to do a hash join across two partitioned tables. Could we see the whole declaration of these tables? (pg_dump -s output would be convenient) The attached table definition with no data wants to mergejoin first, but after disabling mergejoin it does indeed do a hashjoin. Looking back at the cost estimates for the merge and nestloop joins, it seems to be selecting the number of rows in the cartesian product * .005 while the number of output rows in this case is 2437 (cartesian product * 4e-9). Perhaps the cost estimates for the real data are so high because of this bogus row count that the fudge factor to disable mergejoin isn't enough? Kris Jurka CREATE TABLE impounds ( vin character varying(17) NOT NULL, impounddate date NOT NULL, eventtypeid integer NOT NULL, ori character varying(9), platenumber character varying(8), platestate character varying(2), plateyear integer, platetype character varying(2), vehicleyear integer, vehiclemake text, vehiclemodel text, vehiclestyle text, vehiclecolor text, townotes text, damagenotes text, platenotes text, custodynotes text, sourcenotes text, referencenumber1 text, referencenumber2 text, referencenumber3 text, contactname text, contactphone text ); ALTER TABLE ONLY impounds ADD CONSTRAINT impounds_pk PRIMARY KEY (vin, impounddate, eventtypeid); CREATE TABLE impounds_s1 () INHERITS (impounds); CREATE TABLE impounds_s2 () INHERITS (impounds); CREATE TABLE impounds_s3 () INHERITS (impounds); CREATE TABLE impounds_s4 () INHERITS (impounds); ALTER TABLE impounds_s1 ADD CONSTRAINT impounds_s1_pk PRIMARY KEY (vin, impounddate); ALTER TABLE impounds_s2 ADD CONSTRAINT impounds_s2_pk PRIMARY KEY (vin, impounddate); ALTER TABLE impounds_s3 ADD CONSTRAINT impounds_s3_pk PRIMARY KEY (vin, impounddate); ALTER TABLE impounds_s4 ADD CONSTRAINT impounds_s4_pk PRIMARY KEY (vin, impounddate); CREATE TABLE liens ( agentid integer NOT NULL, vin character varying(17) NOT NULL, liendate date, accountnumber character varying(50), customername character varying(50), state character varying(2), vehiclemake character varying(20), vehiclemodel character varying(20), vehicleyear integer ); ALTER TABLE ONLY liens ADD CONSTRAINT liens_pk PRIMARY KEY (vin, agentid); CREATE TABLE liens_s1 () INHERITS (liens); CREATE TABLE liens_s2 () INHERITS (liens); CREATE TABLE liens_s3 () INHERITS (liens); CREATE TABLE liens_s4 () INHERITS (liens); CREATE TABLE liens_s5 () INHERITS (liens); CREATE TABLE liens_s6 () INHERITS (liens); CREATE TABLE liens_s7 () INHERITS (liens); ALTER TABLE liens_s1 ADD CONSTRAINT lines_s1_pk PRIMARY KEY (vin); ALTER TABLE liens_s2 ADD CONSTRAINT lines_s2_pk PRIMARY KEY (vin); ALTER TABLE liens_s3 ADD CONSTRAINT lines_s3_pk PRIMARY KEY (vin); ALTER TABLE liens_s4 ADD CONSTRAINT lines_s4_pk PRIMARY KEY (vin); ALTER TABLE liens_s5 ADD CONSTRAINT lines_s5_pk PRIMARY KEY (vin); ALTER TABLE liens_s6 ADD CONSTRAINT lines_s6_pk PRIMARY KEY (vin); ALTER TABLE liens_s7 ADD CONSTRAINT lines_s7_pk PRIMARY KEY (vin); -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] No hash join across partitioned tables?
On Thu, 16 Apr 2009, Kris Jurka wrote: Perhaps the cost estimates for the real data are so high because of this bogus row count that the fudge factor to disable mergejoin isn't enough? Indeed, I get these cost estimates on 8.4b1 with an increased disable_cost value: nestloop: 11171206.18 merge: 58377401.39 hash: 116763544.76 So the default disable_cost isn't enough to push it to use the hash join plan and goes back to nestloop. Since disable_cost hasn't been touched since January 2000, perhaps it's time to bump that up to match today's hardware and problem sizes? This isn't even a particularly big problem, it's joing 18M rows against 30k. The real problem is getting reasonable stats to pass through the partition Append step, so it can make a reasonable estimate of the join output size. Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG writes a lot to the disk
On Thu, 20 Mar 2008, Albe Laurenz wrote: PostgreSQL doesn't write into the table files when it SELECTs data. It could easily be hint bit updates that are set by selects getting written. Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] viewing source code
On Mon, 17 Dec 2007, Merlin Moncure wrote: the table is pg_proc. you have to revoke select rights from public and the user of interest. be aware this will make it very difficult for that user to do certain things in psql and (especially) pgadmin. it works. a better solution to this problem is to make a language wrapper for pl/pgsql that encrypts the source on disk. afaik, no one is working on th is. it would secure the code from remote users but not necessarily from people logged in to the server. the pg_proc hack works ok though. Another enhancement that would improve this situation would be to implement per column permissions as the sql spec has, so that you could revoke select on just the prosrc column and allow clients to retrieve the metadata they need. Kris Jurka ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] [JDBC] does prepareThreshold work? forced to use old driver
On Mon, 26 Feb 2007, Gene wrote: I've been having some serious performance issues with postgresql8.2/hibernate/jdbc due to postgres reusing bad cached query plans. It doesn't look at the parameter values and therefore does not use any partial indexes. After trying to set prepareThreshold=0 in the connection string which didnt work, even modifying the jdbc driver and forcing it to 0 and not working I realized that it must be being ignored. After giving up pretty much I tried a much older driver which doesn't use server prepared statements at all the problem has gone away and it is once again using the partial indexes. How can I get this to work properly on the new jdbc driver? I don't really like having to use a 2 year old driver to get good performance as you can imagine :) Something must be going wrong in the setting to zero or your code may be setting it to non-zero at some later point. I believe prepareThreshold=0 should work. Do you have a test case showing it doesn't? Kris Jurka ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] in memory views
On Wed, 10 May 2006, Thomas Vatter wrote: Yes, the difference between psql command line and application is 6 seconds to 40 seconds. It is exactly the step resultSet = excecuteQuery() that needs 40 seconds. I use next() as a cursor through the resultSet, but I fear this is not enough, do I have to use createStatement(resultSetType, resultSetConcurrency) respectively prepareStatement (resultSetType, resultSetConcurrency) to achieve the cursor behaviour? http://jdbc.postgresql.org/documentation/81/query.html#query-with-cursor Kris Jurka ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] WAL logging of SELECT ... INTO command
On Fri, 24 Mar 2006, Jim C. Nasby wrote: On Wed, Mar 22, 2006 at 02:37:28PM -0500, Kris Jurka wrote: On Wed, 22 Mar 2006, Jim C. Nasby wrote: Ok, I saw disk activity on the base directory and assumed it was pg_xlog stuff. Turns out that both SELECT INTO and CREATE TABLE AS ignore default_tablepsace and create the new tables in the base directory. I'm guessing that's a bug... (this is on 8.1.2, btw). This has been fixed in CVS HEAD as part of a patch to allow additional options to CREATE TABLE AS. http://archives.postgresql.org/pgsql-patches/2006-02/msg00211.php I'll argue that the current behavior is still a bug and should be fixed. Would it be difficult to patch 8.1 (and 8.0 if there were tablespaces then...) to honor default_tablespace? Here are patches that fix this for 8.0 and 8.1. Kris JurkaIndex: src/backend/executor/execMain.c === RCS file: /projects/cvsroot/pgsql/src/backend/executor/execMain.c,v retrieving revision 1.241.4.2 diff -c -r1.241.4.2 execMain.c *** src/backend/executor/execMain.c 12 Jan 2006 21:49:17 - 1.241.4.2 --- src/backend/executor/execMain.c 24 Mar 2006 18:05:53 - *** *** 36,41 --- 36,42 #include catalog/heap.h #include catalog/namespace.h #include commands/tablecmds.h + #include commands/tablespace.h #include commands/trigger.h #include executor/execdebug.h #include executor/execdefs.h *** *** 731,736 --- 732,738 { char *intoName; Oid namespaceId; + Oid tablespaceId; AclResult aclresult; Oid intoRelationId; TupleDesc tupdesc; *** *** 747,752 --- 749,764 aclcheck_error(aclresult, ACL_KIND_NAMESPACE, get_namespace_name(namespaceId)); + tablespaceId = GetDefaultTablespace(); + if (OidIsValid(tablespaceId)) { + aclresult = pg_tablespace_aclcheck(tablespaceId, GetUserId(), + ACL_CREATE); + + if (aclresult != ACLCHECK_OK) + aclcheck_error(aclresult, ACL_KIND_TABLESPACE, + get_tablespace_name(tablespaceId)); + } + /* * have to copy tupType to get rid of constraints */ *** *** 754,760 intoRelationId = heap_create_with_catalog(intoName, namespaceId, ! InvalidOid, tupdesc, RELKIND_RELATION, false, --- 766,772 intoRelationId = heap_create_with_catalog(intoName, namespaceId, ! tablespaceId, tupdesc, RELKIND_RELATION, false, Index: src/backend/executor/execMain.c === RCS file: /projects/cvsroot/pgsql/src/backend/executor/execMain.c,v retrieving revision 1.256.2.5 diff -c -r1.256.2.5 execMain.c *** src/backend/executor/execMain.c 12 Jan 2006 21:49:06 - 1.256.2.5 --- src/backend/executor/execMain.c 24 Mar 2006 17:57:11 - *** *** 37,42 --- 37,43 #include catalog/heap.h #include catalog/namespace.h #include commands/tablecmds.h + #include commands/tablespace.h #include commands/trigger.h #include executor/execdebug.h #include executor/execdefs.h *** *** 737,742 --- 738,744 { char *intoName; Oid namespaceId; + Oid tablespaceId; AclResult aclresult; Oid intoRelationId; TupleDesc tupdesc
Re: [PERFORM] WAL logging of SELECT ... INTO command
On Wed, 22 Mar 2006, Jim C. Nasby wrote: Ok, I saw disk activity on the base directory and assumed it was pg_xlog stuff. Turns out that both SELECT INTO and CREATE TABLE AS ignore default_tablepsace and create the new tables in the base directory. I'm guessing that's a bug... (this is on 8.1.2, btw). This has been fixed in CVS HEAD as part of a patch to allow additional options to CREATE TABLE AS. http://archives.postgresql.org/pgsql-patches/2006-02/msg00211.php Kris Jurka ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [Bizgres-general] Re: [PERFORM] faster INSERT with possible
On Wed, 27 Jul 2005, Josh Berkus wrote: b) you can't index a temp table. jurka# create temp table t (a int); CREATE jurka# create index myi on t(a); CREATE ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] COPY vs INSERT
On Wed, 4 May 2005, Mischa Sandberg wrote: Quoting Kris Jurka [EMAIL PROTECTED]: Not true. A client may send any number of Bind/Execute messages on a prepared statement before a Sync message. Hunh. Interesting optimization in the JDBC driver. I gather it is sending a string of (;)-separated inserts. No, it uses the V3 protocol and a prepared statement and uses Bind/Execute, as I mentioned. Sounds like efficient-but-risky stuff we did for ODBC drivers at Simba ... gets interesting when one of the insert statements in the middle fails. When running inside a transaction (as you really want to do anyway when bulk loading) it is well defined, it is a little odd for auto commit mode though. In autocommit mode the transaction boundary is at the Sync message, not the individual Execute messages, so you will get some rollback on error. The JDBC spec is poorly defined in this area, so we can get away with this. Good to know. Hope that the batch size is parametric, given that you can have inserts with rather large strings bound to 'text' columns in PG --- harder to identify BLOBs when talking to PG, than when talking to MSSQL/Oracle/Sybase. The batch size is not a parameter and I don't think it needs to be. The issue of filling both sides of network buffers and deadlocking only needs to be avoided on one side. The response to an insert request is small and not dependent on the size of the data sent, so we can send as much as we want as long as the server doesn't send much back to us. Kris Jurka ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] COPY vs INSERT
On Wed, 4 May 2005, Mischa Sandberg wrote: Copy makes better use of the TCP connection for transmission. COPY uses the TCP connection like a one-way pipe. INSERT is like an RPC: the sender has to wait until the insert's return status roundtrips. Not true. A client may send any number of Bind/Execute messages on a prepared statement before a Sync message. So multiple inserts may be sent in one network roundtrip. This is exactly how the JDBC driver implements batch statements. There is some limit to the number of queries in flight at any given moment because there is the potential to deadlock if both sides of network buffers are filled up and each side is blocked waiting on a write. The JDBC driver has conservatively selected 256 as the maximum number of queries to send at once. Kris Jurka ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] batch inserts are slow
On Tue, 3 May 2005, Josh Berkus wrote: There are several hacks floating around that add COPY capabilities to the pgjdbc driver. As they all are rather simple hacks, they have not been included in the cvs yet, but they tend to work fine. FWIW, Dave Cramer just added beta COPY capability to JDBC. Contact him on the JDBC list for details; I think he needs testers. I believe Dave has remerged a patch for COPY I posted over a year ago, but he has not yet published it. I would guess it has the same bugs as the original (transaction + error handling) and will meet the same objections that kept the original patch out of the driver in the first place (we want a friendlier API than just a data stream). Kris Jurka ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [JDBC] 8.0.1 performance question.
On Tue, 5 Apr 2005 [EMAIL PROTECTED] wrote: I see statements below being executed non-stop. Who is triggering these statemetns? Is this normal? What am I doing wrong? 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnull FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT def.adsrc FROM pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON (a.attrelid=c.oid ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc L IKE '%nextval(%' These are the results of ResultSetMetaData.isNullable() and isAutoIncrement(), which your code is apparently calling. The results of these calls are cached on a per ResultSet data. We have discussed caching them at a higher level, but couldn't find a way to know when to flush that cache. Kris Jurka ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] [JDBC] 8.0.1 performance question.
On Tue, 5 Apr 2005 [EMAIL PROTECTED] wrote: Thank you for the quick response. To help me debug what's happening, can you tell me what's the difference between the 7.4 and 8.0 jdbc drivers in this regard? Is this something that is newly introduced in 8.0? Or is this something that has always been happening? 8.0 is the first driver version to take advantage of the V3 protocol's ability to return the base tables and columns of a ResultSet. Previously isNullable was hardcoded to always return columnNullableUnknown and isAutoIncrement always returned false. I guess the question is why are you calling these methods if they didn't work previously? Kris Jurka ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] JDBC best practice
On Mon, 28 Mar 2005, Dave Held wrote: I'm using a Postgres table as the data source for a JTable in a Java app. Where rs_ is a RecordSet object. What I'm wondering is whether it's better to call absolute() or relative() or next()/previous(). If absolute() is the slowest call, then I can cache the last row fetched and move relative to that. My suspicion is that next()/previous() is much faster than absolute() when the record to be fetched is very near the last record fetched. I haven't actually tried it, but I'd like some insight if others can already answer this question based on knowledge of the server side and/or the JDBC driver. There are two types of ResultSets that can be returned by the JDBC driver. One is backed by a cursor and can only be used for TYPE_FORWARD_ONLY ResultSets so it is not really applicable to you. The other method retrieves all results at once and stashes them in a Vector. This makes next, absolute, and relative positioning all equal cost. Kris Jurka ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] mis-estimation on data-warehouse aggregate creation
On Tue, 16 Nov 2004, Simon Riggs wrote: The join condition has so many ANDed predicates that we assume that this will reduce the selectivity considerably. It does not, and so you pay the cost dearly later on. Yes, that makes a lot of sense. Without some incredibly good cross-column statistics there is no way it could expect all of the rows to match. Thanks for the analysis. Kris Jurka ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] mis-estimation on data-warehouse aggregate creation
On Tue, 16 Nov 2004, F. Senault wrote: Let me guess... You've never run analyze on your tables ? No, I have. I mentioned that I did in my email, but you can also tell by the exactly correct guesses for some other plan steps: - Seq Scan on period (cost=0.00..90.88 rows=3288 width=54) (actual time=0.118..12.126 rows=3288 loops=1) Kris Jurka ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] execute cursor fetch
On Tue, 12 Oct 2004, Stef wrote: Pierre-Frédéric Caillaud mentioned : = http://www.postgresql.org/docs/7.4/static/jdbc-query.html#AEN24298 My question is : Is this only true for postgres versions = 7.4 ? I see the same section about Setting fetch size to turn cursors on and off is not in the postgres 7.3.7 docs. Does this mean 7.3 the JDBC driver for postgres 7.4 doesn't support this ? You need the 7.4 JDBC driver, but can run it against a 7.3 (or 7.2) database. Also note the 8.0 JDBC driver can only do this against a 7.4 or 8.0 database and not older versions. Kris Jurka ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Normal case or bad query plan?
On Mon, 11 Oct 2004, Gabriele Bartolini wrote: - Seq Scan on ip2location (cost=0.00..30490.65 rows=124781 width=8) (actual time=5338.120..40237.283 rows=1 loops=1) Filter: ((1040878301::bigint = ip_address_from) AND (1040878301::bigint = ip_address_to)) Total runtime: 40237.424 ms I believe the problem is that pg's lack of cross-column statistics is producing the poor number of rows estimate. The number of rows mataching just the first 1040878301::bigint = ip_address_from condition is 122774 which is roughtly 10% of the table. I imagine the query planner believes that the other condition alone will match the other 90% of the table. The problem is that it doesn't know that these two ranges' intersection is actually tiny. The planner assumes a complete or nearly complete overlap so it thinks it will need to fetch 10% of the rows from both the index and the heap and chooses a seqscan. Kris Jurka ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Large # of rows in query extremely slow, not using
On Tue, 14 Sep 2004, Stephen Crowley wrote: Problem solved.. I set the fetchSize to a reasonable value instead of the default of unlimited in the PreparedStatement and now the query is . After some searching it seeems this is a common problem, would it make sense to change the default value to something other than 0 in the JDBC driver? In the JDBC driver, setting the fetch size to a non-zero value means that the query will be run using what the frontend/backend protocol calls a named statement. What this means on the backend is that the planner will not be able to use the values from the query parameters to generate the optimum query plan and must use generic placeholders and create a generic plan. For this reason we have decided not to default to a non-zero fetch size. This is something whose default value could be set by a URL parameter if you think that is something that is really required. Kris Jurka ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Large # of rows in query extremely slow, not using
On Thu, 23 Sep 2004, Stephen Crowley wrote: Thanks for the explanation. So what sort of changes need to be made to the client/server protocol to fix this problem? The problem is that there is no way to indicate why you are using a particular statement in the extended query protocol. For the JDBC driver there are two potential reasons, streaming a ResultSet and using a server prepared statement. For the streaming as default case you desire there needs to be a way to indicate that you don't want to create a generic server prepared statement and that this query is really just for one time use, so it can generate the best plan possible. Additionally you can only stream ResultSets that are of type FORWARD_ONLY. It would also be nice to be able to specify scrollability and holdability when creating a statement and the offset/direction when streaming data from a scrollable one. Kris Jurka ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [JDBC] Cursors performance (was: Re: [PERFORM] Terrible performance
On Fri, 9 Jul 2004, Bill Chandler wrote: Thanks to all who have responded. I now think my problem is not related to deleting/recreating indexes. Somehow it is related to JDBC cursors. It appears that what is happening is that since I'm using a fetch size of 5000, the command: FETCH FORWARD 5000 FROM JDBC_CURS_1 If the top level node of your execution plan is a sort step, it should take essentially no time to retrieve additional rows after the first fetch. The sort step is materializes the results so that future fetches simply need to spit this data back to the client. I would agree with Dave's suggestion to use log_duration and compare the values for the first and subsequent fetches. Kris Jurka ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?
On Sat, 5 Jun 2004, Steve Wampler wrote: [I want to use copy from JDBC] I made a patch to the driver to support COPY as a PG extension. The patch required properly encoded and formatted copy data available from an InputStream. Following some feedback from others I began adding the ability to handle different encodings and the ability to read and write objects without requiring any knowledge of the copy data format. I got hung up on the object read/write part because of some issues with how type conversions are done in the driver. At the moment there is a big push being made by Oliver Jowett to get true V3 protocol support implemented which is currently my highest priority. Getting copy support into the JDBC driver is something I'd like to see for 7.5, but I couldn't say if that will happen or how complete it may be. Depending on your needs perhaps the initial patch is sufficient. http://archives.postgresql.org/pgsql-jdbc/2003-12/msg00186.php Kris Jurka ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?
On Mon, 7 Jun 2004, Steve Wampler wrote: I do have a little concern about what's happening in the back end during the copy - I suspect the entire table is locked, which may impact the performance when multiple clients are saving entries into the table. Anyone know if that's how COPY works? (For that matter, would that also be true of a transaction consisting of a set of inserts?) The table is not locked in either the copy or the insert case. Kris Jurka ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] planner/optimizer question
On Fri, 30 Apr 2004, Gary Doades wrote: Yes, you're right and I have done this just to prove to myself that it is the index scan that is the bottleneck. I have some complex SQL that executes very quickly with Postgres, similar to MSSQL, but the index scans in most of those only touch a few rows for a few loops. It seems to be a problem when the index scan is scanning very many rows and for each of these it has to go to the table just to find out if the index it just looked at is still valid. Another way to speed this up is the TODO item: Use bitmaps to fetch heap pages in sequential order For an indexscan that fetches a number of rows those rows may be anywhere in the base table so as each index entry is found it fetches the corresponding table row from the heap. This is not ideal because you can be jumping around in the heap and end up fetching the same page multiple times because table rows are in the same page, but were found in different places in the index. Kris Jurka ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] severe performance issue with planner (fwd)
I sent this message to the list and although it shows up in the archives, I did not receive a copy of it through the list, so I'm resending as I suspect others did not see it either. -- Forwarded message -- Date: Sat, 13 Mar 2004 22:48:01 -0500 (EST) From: Kris Jurka [EMAIL PROTECTED] To: Tom Lane [EMAIL PROTECTED] Cc: Eric Brown [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: [PERFORM] severe performance issue with planner On Thu, 11 Mar 2004, Tom Lane wrote: Eric Brown [EMAIL PROTECTED] writes: [ planning a 9-table query takes too long ] See http://www.postgresql.org/docs/7.4/static/explicit-joins.html for some useful tips. Is this the best answer we've got? For me with an empty table this query takes 4 seconds to plan, is that the expected planning time? I know I've got nine table queries that don't take that long. Setting geqo_threshold less than 9, it takes 1 second to plan. Does this indicate that geqo_threshold is set too high, or is it a tradeoff between planning time and plan quality? If the planning time is so high because the are a large number of possible join orders, should geqo_threhold be based on the number of possible plans somehow instead of the number of tables involved? Kris Jurka ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])