Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

2010-04-21 Thread Kris Jurka



On Wed, 21 Apr 2010, Robert Haas wrote:


On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka bo...@ejurka.com wrote:


b) Using the parameter values for statistics, but not making any stronger
guarantees about them.  So the parameters will be used for evaluating the
selectivity, but not to perform other optimizations like contraint_exclusion
or transforming a LIKE operation to a range query. This is the default plan
type the JDBC driver uses.


Hmm.  I didn't think this was possible.  How are you doing this?


This is only possible at the protocol level, it's not available using SQL 
commands only.  You do this by creating an unnamed instead of a named 
statement:


http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY

Query planning for named prepared-statement objects occurs when
the Parse message is processed. If a query will be repeatedly
executed with different parameters, it might be beneficial to send
a single Parse message containing a parameterized query, followed
by multiple Bind and Execute messages. This will avoid replanning
the query on each execution.

The unnamed prepared statement is likewise planned during Parse
processing if the Parse message defines no parameters. But if
there are parameters, query planning occurs during Bind processing
instead. This allows the planner to make use of the actual values
of the parameters provided in the Bind message when planning the
query.


Kris Jurka
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

2010-04-21 Thread Kris Jurka



On Wed, 21 Apr 2010, Nikolas Everett wrote:

More to the point is there some option that can shift you into method a? 
 I'm thinking of warehousing type applications where you want to re-plan 
a good portion of your queries.




This can be done by connecting to the database using the V2 protocol (use 
URL option protocolVersion=2).  This does remove some functionality of 
the driver that is only available for V3 protocol, but will work just 
fine for query execution.


Kris Jurka

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

2010-04-20 Thread Kris Jurka



On Mon, 19 Apr 2010, Dave Crooke wrote:


Statement.close() appears to get the job done (in my envrionment, PG's
driver never sees a Connection.close() because of DBCP).

I'd consider the fact that ResultSet.close() does not release the implicit
cursor to be something of a bug, but it may well have been fixed already.


PG doesn't release the locks acquired by the query until transaction end. 
So closing a cursor will release some backend memory, but it won't release 
the locks.  The way the driver implements ResultSet.close() is to put 
the close message into a queue so that the next time a message is sent to 
the backend we'll also send the cursor close message.  This avoids an 
extra network roundtrip for the close action.


In any case Statement.close isn't helping you here either.  It's really 
Connection.commit/rollback that's releasing the locks.


Kris Jurka

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very high effective_cache_size == worse performance?

2010-04-20 Thread Kris Jurka



On Tue, 20 Apr 2010, Nikolas Everett wrote:


You can absolutely use copy if you like but you need to use a non-standard
jdbc driver:  kato.iki.fi/sw/db/postgresql/jdbc/copy/.  I've used it in the
past and it worked.


Copy support has been added to the 8.4 driver.

Kris Jurka

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

2010-04-20 Thread Kris Jurka



On Tue, 20 Apr 2010, Dave Crooke wrote:


a. Make setFetchSize(1) the default


The reason this is not done is that the mechanism used for fetching a 
piece of the results at a time can change the query plan used if using a 
PreparedStatement.  There are three ways to plan a PreparedStatement:


a) Using the exact parameter values by substituting them directly into the 
query.  This isn't really planned as you can't re-use it at all.  This 
is only available using the V2 protocol.


b) Using the parameter values for statistics, but not making any stronger
guarantees about them.  So the parameters will be used for evaluating the 
selectivity, but not to perform other optimizations like 
contraint_exclusion or transforming a LIKE operation to a range query. 
This is the default plan type the JDBC driver uses.


c) Planning the query with no regard for the parameters passed to it. 
This is the plan type the JDBC driver uses when it sees the same 
PreparedStatement being re-used multiple times or when it is respecting 
setFetchSize and allowing for partial results.


We must use (c) for partial results instead of (b) because of some 
limitations of the server.  Currently you cannot have two statements of 
type (b) open on the same connection.  So since the driver can't know if 
the user will issue another query before fetching the remainder of the 
first query's results, it must setup the first query to be of type (c) so 
that multiple statements can exist simultaneously.


Switching the default plan type to (c) will cause a significant number of 
complaints as performance on some queries will go into the tank.  Perhaps 
we could have a default fetchSize for plain Statements as it won't affect 
the plan.  I could also see making this a URL parameter though so it could 
be set as the default with only a configuration, not a code change.



b. If someone does call rs.close() before the end of the ResultSet, and has
not created an explicit cursor at the JDBC level, flag the query / lock /
virtual transaction in some way in the JDBC driver that tells it that it can
just dump the cursor on a subsequent stmt.close(), conn.commit() or
conn.close() call without sucking down the rest of the data.


This is already true.  The JDBC driver only asks the server for more of 
the ResultSet when a next() call requires it.  So the server isn't 
constantly spewing out rows that the driver must deal with, the driver 
only gets the rows it asks for.  Once the ResultSet is closed, it won't 
ask for any more.


Kris Jurka


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query slowing down significantly??

2010-03-02 Thread Kris Jurka



On Mon, 1 Mar 2010, Rainer Pruy wrote:

It is a Java app, using jdbc, but through a proprietary persistence 
framework. I'm just busy evaluating the effects on the app of 
prohibiting prepared statements via jdbc. If this is not worthwhile, I'm 
bound to some expensive reorganizations, sigh.


You can disable the named statement by adding the parameter 
prepareThreshold=0 to your connection URL.


Kris Jurka

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-27 Thread Kris Jurka

Scott Marlowe wrote:

On Sun, Apr 26, 2009 at 11:07 AM, Kris Jurka bo...@ejurka.com wrote:


As a note for non-JDBC users, the JDBC driver's batch interface allows
executing multiple statements in a single network roundtrip.  This is
something you can't get in libpq, so beware of this for comparison's sake..


Really?  I thought that executing statements like so:

select * from a;insert ...;delete;

in psql / libpq would execute them all in one trip.


Right, but those aren't prepared.  I suppose it's possible to issue a 
prepare and then issue a batch of comma separated execute statements, 
but that's not exactly a natural interface.


Kris Jurka

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-26 Thread Kris Jurka



On Thu, 23 Apr 2009, Thomas Kellerer wrote:


Out of curiosity I did some tests through JDBC.

Using a single-column (integer) table, re-using a prepared statement 
took about 7 seconds to insert 10 rows with JDBC's batch interface 
and a batch size of 1000




As a note for non-JDBC users, the JDBC driver's batch interface allows 
executing multiple statements in a single network roundtrip.  This is 
something you can't get in libpq, so beware of this for comparison's sake.


I also played around with batch size. Going beyond 200 didn't make a big 
difference.




Despite the size of the batch passed to the JDBC driver, the driver breaks 
it up into internal sub-batch sizes of 256 to send to the server.  It does 
this to avoid network deadlocks from sending too much data to the server 
without reading any in return.  If the driver was written differently it 
could handle this better and send the full batch size, but at the moment 
that's not possible and we're hoping the gains beyond this size aren't too 
large.


Kris Jurka

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2009-04-17 Thread Kris Jurka

Tom Lane wrote:


Is there another issue here besides that one?  I think you were hoping
that the hash join would be faster than the alternatives, but the cost
estimate says it's a lot slower.  Is that actually the case?



The hash join takes less than twenty seconds, the other two joins I 
killed after five minutes.  I can try to collect explain analyze results 
later today if you'd like.


Kris Jurka

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2009-04-17 Thread Kris Jurka

Tom Lane wrote:

Kris Jurka bo...@ejurka.com writes:
The hash join takes less than twenty seconds, the other two joins I 
killed after five minutes.  I can try to collect explain analyze results 
later today if you'd like.




Attached are the explain analyze results.  The analyze part hits the 
hash join worst of all, so I've also included the timings without analyzing.


Method   Time (ms)  Time w/Analyze (ms)
nestloop 304853 319060
merge514517 683757
hash  18957 143731

Kris Jurka
 Aggregate  (cost=116546928.75..116546928.76 rows=1 width=0) (actual 
time=143731.602..143731.604 rows=1 loops=1)
   -  Hash Join  (cost=1470.48..108802907.84 rows=3097608361 width=0) (actual 
time=308.015..143724.055 rows=2437 loops=1)
 Hash Cond: ((l.vin)::text = (i.vin)::text)
 -  Append  (cost=0.00..332098.75 rows=18450775 width=18) (actual 
time=0.069..99984.899 rows=18449996 loops=1)
   -  Seq Scan on liens l  (cost=0.00..14.00 rows=400 width=21) 
(actual time=0.003..0.003 rows=0 loops=1)
   -  Seq Scan on liens_s1 l  (cost=0.00..18633.44 rows=917444 
width=18) (actual time=0.060..1828.740 rows=917444 loops=1)
   -  Seq Scan on liens_s2 l  (cost=0.00..20.92 rows=1192 
width=18) (actual time=0.010..2.274 rows=1192 loops=1)
   -  Seq Scan on liens_s3 l  (cost=0.00..53793.79 rows=2934179 
width=18) (actual time=0.054..5777.782 rows=2934179 loops=1)
   -  Seq Scan on liens_s4 l  (cost=0.00..21069.39 rows=1214139 
width=18) (actual time=0.065..2413.429 rows=1214139 loops=1)
   -  Seq Scan on liens_s5 l  (cost=0.00..29966.37 rows=1726837 
width=18) (actual time=0.046..3394.974 rows=1726837 loops=1)
   -  Seq Scan on liens_s6 l  (cost=0.00..10587.18 rows=462918 
width=18) (actual time=0.053..936.379 rows=462918 loops=1)
   -  Seq Scan on liens_s7 l  (cost=0.00..14.00 rows=400 width=21) 
(actual time=0.003..0.003 rows=0 loops=1)
   -  Seq Scan on liens_s8 l  (cost=0.00..86004.68 rows=4956168 
width=18) (actual time=0.045..9729.965 rows=4956182 loops=1)
   -  Seq Scan on liens_s9 l  (cost=0.00..320.29 rows=18429 
width=18) (actual time=0.010..34.880 rows=18429 loops=1)
   -  Seq Scan on liens_s10 l  (cost=0.00..18398.16 rows=951016 
width=18) (actual time=0.055..1889.948 rows=951016 loops=1)
   -  Seq Scan on liens_s11 l  (cost=0.00..9956.22 rows=543022 
width=18) (actual time=0.055..1070.156 rows=543022 loops=1)
   -  Seq Scan on liens_s12 l  (cost=0.00..78813.85 rows=4541785 
width=18) (actual time=0.012..9431.035 rows=4541792 loops=1)
   -  Seq Scan on liens_s13 l  (cost=0.00..4506.46 rows=182846 
width=18) (actual time=0.049..374.788 rows=182846 loops=1)
 -  Hash  (cost=1050.77..1050.77 rows=33577 width=18) (actual 
time=256.374..256.374 rows=33297 loops=1)
   -  Append  (cost=0.00..1050.77 rows=33577 width=18) (actual 
time=0.019..188.152 rows=33297 loops=1)
 -  Seq Scan on impounds i  (cost=0.00..11.40 rows=140 
width=21) (actual time=0.002..0.002 rows=0 loops=1)
 -  Seq Scan on impounds_s1 i  (cost=0.00..11.40 rows=140 
width=21) (actual time=0.002..0.002 rows=0 loops=1)
 -  Seq Scan on impounds_s2 i  (cost=0.00..913.87 
rows=29587 width=18) (actual time=0.008..60.728 rows=29587 loops=1)
 -  Seq Scan on impounds_s3 i  (cost=0.00..18.14 rows=414 
width=18) (actual time=0.009..0.848 rows=414 loops=1)
 -  Seq Scan on impounds_s4 i  (cost=0.00..95.96 rows=3296 
width=18) (actual time=0.012..6.894 rows=3296 loops=1)
 Total runtime: 143731.788 ms
(26 rows)

QUERY PLAN
---
 Aggregate  (cost=57241210.61..57241210.62 rows=1 width=0) (actual 
time=683467.350..683467.352 rows=1 loops=1)
   -  Merge Join  (cost=2940810.41..49497189.70 rows=3097608361 width=0) 
(actual time=434026.342..683460.545 rows=2437 loops=1)
 Merge Cond: ((l.vin)::text = (i.vin)::text)
 -  Sort  (cost=2937235.46..2983362.40 rows=18450775 width=18) (actual 
time=433519.957..637389.755 rows=18449961 loops=1)
   Sort Key: l.vin
   Sort Method:  external merge  Disk: 504728kB
   -  Append  (cost=0.00..332098.75 rows=18450775 width=18) 
(actual time=14.764..102905.170 rows=18449996 loops=1)
 -  Seq Scan on liens l  (cost=0.00..14.00 rows=400 
width=21) (actual time=0.003..0.003 rows=0 loops=1)
 -  Seq Scan on liens_s1 l  (cost=0.00..18633.44 
rows=917444 width=18) (actual time=14.755..2167.668 rows=917444 loops=1)
 -  Seq Scan on liens_s2 l  (cost=0.00..20.92 rows=1192 
width=18) (actual time=0.012..2.304 rows

[PERFORM] No hash join across partitioned tables?

2009-04-16 Thread Kris Jurka
  (cost=4073102.86..4303737.81 rows=18450796 width=21)
   -  Sort  (cost=4073102.86..4119229.85 rows=18450796 width=21)
 Sort Key: l.vin
 -  Append  (cost=0.00..332797.96 rows=18450796 width=21)
   -  Seq Scan on liens l  (cost=0.00..14.00 rows=400 
width=21)
   -  [Seq Scans on other partitions]


Disabling mergejoin pushes it back to a nestloop join.  Why can't it hash 
join these two together?


Kris Jurka

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2009-04-16 Thread Kris Jurka



On Thu, 16 Apr 2009, Tom Lane wrote:


Kris Jurka bo...@ejurka.com writes:

PG (8.3.7) doesn't seem to want to do a hash join across two partitioned
tables.


Could we see the whole declaration of these tables?  (pg_dump -s output
would be convenient)



The attached table definition with no data wants to mergejoin first, but 
after disabling mergejoin it does indeed do a hashjoin.


Looking back at the cost estimates for the merge and nestloop joins, it 
seems to be selecting the number of rows in the cartesian product * .005 
while the number of output rows in this case is 2437 (cartesian product * 
4e-9).  Perhaps the cost estimates for the real data are so high because 
of this bogus row count that the fudge factor to disable mergejoin isn't 
enough?


Kris Jurka
CREATE TABLE impounds (
vin character varying(17) NOT NULL,
impounddate date NOT NULL,
eventtypeid integer NOT NULL,
ori character varying(9),
platenumber character varying(8),
platestate character varying(2),
plateyear integer,
platetype character varying(2),
vehicleyear integer,
vehiclemake text,
vehiclemodel text,
vehiclestyle text,
vehiclecolor text,
townotes text,
damagenotes text,
platenotes text,
custodynotes text,
sourcenotes text,
referencenumber1 text,
referencenumber2 text,
referencenumber3 text,
contactname text,
contactphone text
);



ALTER TABLE ONLY impounds
ADD CONSTRAINT impounds_pk PRIMARY KEY (vin, impounddate, eventtypeid);

CREATE TABLE impounds_s1 () INHERITS (impounds);
CREATE TABLE impounds_s2 () INHERITS (impounds);
CREATE TABLE impounds_s3 () INHERITS (impounds);
CREATE TABLE impounds_s4 () INHERITS (impounds);

ALTER TABLE impounds_s1 ADD CONSTRAINT impounds_s1_pk PRIMARY KEY (vin, 
impounddate);
ALTER TABLE impounds_s2 ADD CONSTRAINT impounds_s2_pk PRIMARY KEY (vin, 
impounddate);
ALTER TABLE impounds_s3 ADD CONSTRAINT impounds_s3_pk PRIMARY KEY (vin, 
impounddate);
ALTER TABLE impounds_s4 ADD CONSTRAINT impounds_s4_pk PRIMARY KEY (vin, 
impounddate);



CREATE TABLE liens (
agentid integer NOT NULL,
vin character varying(17) NOT NULL,
liendate date,
accountnumber character varying(50),
customername character varying(50),
state character varying(2),
vehiclemake character varying(20),
vehiclemodel character varying(20),
vehicleyear integer
);

ALTER TABLE ONLY liens
ADD CONSTRAINT liens_pk PRIMARY KEY (vin, agentid);

CREATE TABLE liens_s1 () INHERITS (liens);
CREATE TABLE liens_s2 () INHERITS (liens);
CREATE TABLE liens_s3 () INHERITS (liens);
CREATE TABLE liens_s4 () INHERITS (liens);
CREATE TABLE liens_s5 () INHERITS (liens);
CREATE TABLE liens_s6 () INHERITS (liens);
CREATE TABLE liens_s7 () INHERITS (liens);

ALTER TABLE liens_s1 ADD CONSTRAINT lines_s1_pk PRIMARY KEY (vin);
ALTER TABLE liens_s2 ADD CONSTRAINT lines_s2_pk PRIMARY KEY (vin);
ALTER TABLE liens_s3 ADD CONSTRAINT lines_s3_pk PRIMARY KEY (vin);
ALTER TABLE liens_s4 ADD CONSTRAINT lines_s4_pk PRIMARY KEY (vin);
ALTER TABLE liens_s5 ADD CONSTRAINT lines_s5_pk PRIMARY KEY (vin);
ALTER TABLE liens_s6 ADD CONSTRAINT lines_s6_pk PRIMARY KEY (vin);
ALTER TABLE liens_s7 ADD CONSTRAINT lines_s7_pk PRIMARY KEY (vin);


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2009-04-16 Thread Kris Jurka



On Thu, 16 Apr 2009, Kris Jurka wrote:

Perhaps the cost estimates for the real data are so high because of this 
bogus row count that the fudge factor to disable mergejoin isn't enough?




Indeed, I get these cost estimates on 8.4b1 with an increased 
disable_cost value:


nestloop:  11171206.18
merge: 58377401.39
hash: 116763544.76

So the default disable_cost isn't enough to push it to use the hash join 
plan and goes back to nestloop.  Since disable_cost hasn't been touched 
since January 2000, perhaps it's time to bump that up to match today's 
hardware and problem sizes?  This isn't even a particularly big problem, 
it's joing 18M rows against 30k.


The real problem is getting reasonable stats to pass through the partition 
Append step, so it can make a reasonable estimate of the join output size.


Kris Jurka


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG writes a lot to the disk

2008-03-20 Thread Kris Jurka



On Thu, 20 Mar 2008, Albe Laurenz wrote:


PostgreSQL doesn't write into the table files when it SELECTs data.



It could easily be hint bit updates that are set by selects getting 
written.


Kris Jurka


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] viewing source code

2007-12-17 Thread Kris Jurka



On Mon, 17 Dec 2007, Merlin Moncure wrote:


the table is pg_proc.  you have to revoke select rights from public
and the user of interest.  be aware this will make it very difficult
for that user to do certain things in psql and (especially) pgadmin.
it works.

a better solution to this problem is to make a language wrapper for
pl/pgsql that encrypts the source on disk. afaik, no one is working on
th is.  it would secure the code from remote users but not necessarily
from people logged in to the server.  the pg_proc hack works ok
though.



Another enhancement that would improve this situation would be to 
implement per column permissions as the sql spec has, so that you could 
revoke select on just the prosrc column and allow clients to retrieve the 
metadata they need.


Kris Jurka

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [JDBC] does prepareThreshold work? forced to use old driver

2007-02-26 Thread Kris Jurka



On Mon, 26 Feb 2007, Gene wrote:


I've been having some serious performance issues with
postgresql8.2/hibernate/jdbc due to postgres reusing bad cached query
plans. It doesn't look at the parameter values and therefore does not
use any partial indexes.

After trying to set prepareThreshold=0 in the connection string which
didnt work, even modifying the jdbc driver and forcing it to 0 and not
working I realized that it must be being ignored. After giving up
pretty much I tried a much older driver which doesn't use server
prepared statements at all the problem has gone away and it is once
again using the partial indexes. How can I get this to work properly
on the new jdbc driver? I don't really like having to use a 2 year old
driver to get good performance as you can imagine :)


Something must be going wrong in the setting to zero or your code may be 
setting it to non-zero at some later point.  I believe prepareThreshold=0 
should work.  Do you have a test case showing it doesn't?


Kris Jurka


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] in memory views

2006-05-10 Thread Kris Jurka



On Wed, 10 May 2006, Thomas Vatter wrote:

Yes, the difference between psql command line and application is 6 
seconds to 40 seconds. It is exactly the step resultSet = 
excecuteQuery() that needs 40 seconds. I use next() as a cursor through 
the resultSet, but I fear this is not enough, do I have to use 
createStatement(resultSetType, resultSetConcurrency) respectively 
prepareStatement (resultSetType, resultSetConcurrency) to achieve the 
cursor behaviour?


http://jdbc.postgresql.org/documentation/81/query.html#query-with-cursor

Kris Jurka


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] WAL logging of SELECT ... INTO command

2006-03-24 Thread Kris Jurka



On Fri, 24 Mar 2006, Jim C. Nasby wrote:


On Wed, Mar 22, 2006 at 02:37:28PM -0500, Kris Jurka wrote:


On Wed, 22 Mar 2006, Jim C. Nasby wrote:


Ok, I saw disk activity on the base directory and assumed it was pg_xlog
stuff. Turns out that both SELECT INTO and CREATE TABLE AS ignore
default_tablepsace and create the new tables in the base directory. I'm
guessing that's a bug... (this is on 8.1.2, btw).


This has been fixed in CVS HEAD as part of a patch to allow additional
options to CREATE TABLE AS.

http://archives.postgresql.org/pgsql-patches/2006-02/msg00211.php


I'll argue that the current behavior is still a bug and should be fixed.
Would it be difficult to patch 8.1 (and 8.0 if there were tablespaces
then...) to honor default_tablespace?


Here are patches that fix this for 8.0 and 8.1.

Kris JurkaIndex: src/backend/executor/execMain.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/executor/execMain.c,v
retrieving revision 1.241.4.2
diff -c -r1.241.4.2 execMain.c
*** src/backend/executor/execMain.c 12 Jan 2006 21:49:17 -  
1.241.4.2
--- src/backend/executor/execMain.c 24 Mar 2006 18:05:53 -
***
*** 36,41 
--- 36,42 
  #include catalog/heap.h
  #include catalog/namespace.h
  #include commands/tablecmds.h
+ #include commands/tablespace.h
  #include commands/trigger.h
  #include executor/execdebug.h
  #include executor/execdefs.h
***
*** 731,736 
--- 732,738 
{
char   *intoName;
Oid namespaceId;
+   Oid tablespaceId;
AclResult   aclresult;
Oid intoRelationId;
TupleDesc   tupdesc;
***
*** 747,752 
--- 749,764 
aclcheck_error(aclresult, ACL_KIND_NAMESPACE,
   
get_namespace_name(namespaceId));
  
+   tablespaceId = GetDefaultTablespace();
+   if (OidIsValid(tablespaceId)) {
+   aclresult = pg_tablespace_aclcheck(tablespaceId, 
GetUserId(),
+   
   ACL_CREATE);
+ 
+   if (aclresult != ACLCHECK_OK)
+   aclcheck_error(aclresult, ACL_KIND_TABLESPACE,
+  
get_tablespace_name(tablespaceId));
+   }
+ 
/*
 * have to copy tupType to get rid of constraints
 */
***
*** 754,760 
  
intoRelationId = heap_create_with_catalog(intoName,

  namespaceId,
!   
  InvalidOid,

  tupdesc,

  RELKIND_RELATION,

  false,
--- 766,772 
  
intoRelationId = heap_create_with_catalog(intoName,

  namespaceId,
!   
  tablespaceId,

  tupdesc,

  RELKIND_RELATION,

  false,
Index: src/backend/executor/execMain.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/executor/execMain.c,v
retrieving revision 1.256.2.5
diff -c -r1.256.2.5 execMain.c
*** src/backend/executor/execMain.c 12 Jan 2006 21:49:06 -  
1.256.2.5
--- src/backend/executor/execMain.c 24 Mar 2006 17:57:11 -
***
*** 37,42 
--- 37,43 
  #include catalog/heap.h
  #include catalog/namespace.h
  #include commands/tablecmds.h
+ #include commands/tablespace.h
  #include commands/trigger.h
  #include executor/execdebug.h
  #include executor/execdefs.h
***
*** 737,742 
--- 738,744 
{
char   *intoName;
Oid namespaceId;
+   Oid tablespaceId;
AclResult   aclresult;
Oid intoRelationId;
TupleDesc   tupdesc

Re: [PERFORM] WAL logging of SELECT ... INTO command

2006-03-22 Thread Kris Jurka



On Wed, 22 Mar 2006, Jim C. Nasby wrote:


Ok, I saw disk activity on the base directory and assumed it was pg_xlog
stuff. Turns out that both SELECT INTO and CREATE TABLE AS ignore
default_tablepsace and create the new tables in the base directory. I'm
guessing that's a bug... (this is on 8.1.2, btw).


This has been fixed in CVS HEAD as part of a patch to allow additional 
options to CREATE TABLE AS.


http://archives.postgresql.org/pgsql-patches/2006-02/msg00211.php

Kris Jurka


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [Bizgres-general] Re: [PERFORM] faster INSERT with possible

2005-07-27 Thread Kris Jurka


On Wed, 27 Jul 2005, Josh Berkus wrote:

 b) you can't index a temp table.
 

jurka# create temp table t (a int);
CREATE
jurka# create index myi on t(a);
CREATE

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] COPY vs INSERT

2005-05-05 Thread Kris Jurka


On Wed, 4 May 2005, Mischa Sandberg wrote:

 Quoting Kris Jurka [EMAIL PROTECTED]: 
  
  Not true.  A client may send any number of Bind/Execute messages on 
  a prepared statement before a Sync message.

 Hunh. Interesting optimization in the JDBC driver. I gather it is 
 sending a string of (;)-separated inserts.

No, it uses the V3 protocol and a prepared statement and uses 
Bind/Execute, as I mentioned.

 Sounds like efficient-but-risky stuff we did for ODBC drivers at Simba
 ... gets interesting when one of the insert statements in the middle
 fails.

When running inside a transaction (as you really want to do anyway when
bulk loading) it is well defined, it is a little odd for auto commit mode
though.  In autocommit mode the transaction boundary is at the Sync
message, not the individual Execute messages, so you will get some
rollback on error.  The JDBC spec is poorly defined in this area, so we
can get away with this.

 Good to know. Hope that the batch size is parametric, given that
 you can have inserts with rather large strings bound to 'text' columns
 in PG --- harder to identify BLOBs when talking to PG, than when talking
 to MSSQL/Oracle/Sybase.

The batch size is not a parameter and I don't think it needs to be.  The 
issue of filling both sides of network buffers and deadlocking only needs 
to be avoided on one side.  The response to an insert request is small and 
not dependent on the size of the data sent, so we can send as much as we 
want as long as the server doesn't send much back to us.

Kris Jurka

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] COPY vs INSERT

2005-05-04 Thread Kris Jurka


On Wed, 4 May 2005, Mischa Sandberg wrote:

 Copy makes better use of the TCP connection for transmission. COPY uses
 the TCP connection like a one-way pipe. INSERT is like an RPC: the
 sender has to wait until the insert's return status roundtrips.

Not true.  A client may send any number of Bind/Execute messages on a 
prepared statement before a Sync message.  So multiple inserts may be sent 
in one network roundtrip.  This is exactly how the JDBC driver 
implements batch statements.  There is some limit to the number of queries 
in flight at any given moment because there is the potential to deadlock 
if both sides of network buffers are filled up and each side is blocked 
waiting on a write.  The JDBC driver has conservatively selected 256 as 
the maximum number of queries to send at once.

Kris Jurka

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] batch inserts are slow

2005-05-03 Thread Kris Jurka


On Tue, 3 May 2005, Josh Berkus wrote:

  There are several hacks floating around that add COPY capabilities to
  the pgjdbc driver. As they all are rather simple hacks, they have not
  been included in the cvs yet, but they tend to work fine.
 
 FWIW, Dave Cramer just added beta COPY capability to JDBC.   Contact him on 
 the JDBC list for details; I think he needs testers.
 

I believe Dave has remerged a patch for COPY I posted over a year ago, but 
he has not yet published it.  I would guess it has the same bugs as the 
original (transaction + error handling) and will meet the same objections 
that kept the original patch out of the driver in the first place (we want 
a friendlier API than just a data stream).

Kris Jurka

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] [JDBC] 8.0.1 performance question.

2005-04-04 Thread Kris Jurka


On Tue, 5 Apr 2005 [EMAIL PROTECTED] wrote:

 I see statements below being executed non-stop.  Who is triggering these
 statemetns?  Is this normal?  What am I doing wrong?
 
 
 2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnull FROM
 pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2
 2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT def.adsrc FROM
 pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON
 (a.attrelid=c.oid
 ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND
 a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc
 L
 IKE '%nextval(%'

These are the results of ResultSetMetaData.isNullable() and 
isAutoIncrement(), which your code is apparently calling.  The results of 
these calls are cached on a per ResultSet data.  We have discussed 
caching them at a higher level, but couldn't find a way to know when to 
flush that cache.

Kris Jurka

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [JDBC] 8.0.1 performance question.

2005-04-04 Thread Kris Jurka


On Tue, 5 Apr 2005 [EMAIL PROTECTED] wrote:

 Thank you for the quick response.  To help me debug what's happening,
 can you tell me what's the difference between the 7.4 and 8.0 jdbc
 drivers in this regard?  Is this something that is newly introduced in
 8.0?  Or is this something that has always been happening?
 

8.0 is the first driver version to take advantage of the V3 protocol's 
ability to return the base tables and columns of a ResultSet.  
Previously isNullable was hardcoded to always return 
columnNullableUnknown and isAutoIncrement always returned false.

I guess the question is why are you calling these methods if they didn't 
work previously?

Kris Jurka

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] JDBC best practice

2005-03-28 Thread Kris Jurka


On Mon, 28 Mar 2005, Dave Held wrote:

 I'm using a Postgres table as the data source for a JTable in a Java
 app.  Where rs_ is a RecordSet object.  What I'm wondering is whether
 it's better to call absolute() or relative() or next()/previous().  If
 absolute() is the slowest call, then I can cache the last row fetched
 and move relative to that.
 
 My suspicion is that next()/previous() is much faster than absolute()
 when the record to be fetched is very near the last record fetched.  I
 haven't actually tried it, but I'd like some insight if others can
 already answer this question based on knowledge of the server side
 and/or the JDBC driver.

There are two types of ResultSets that can be returned by the JDBC driver.  
One is backed by a cursor and can only be used for TYPE_FORWARD_ONLY
ResultSets so it is not really applicable to you.  The other method
retrieves all results at once and stashes them in a Vector.  This makes
next, absolute, and relative positioning all equal cost.

Kris Jurka

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] mis-estimation on data-warehouse aggregate creation

2004-11-17 Thread Kris Jurka


On Tue, 16 Nov 2004, Simon Riggs wrote:

 The join condition has so many ANDed predicates that we assume that this
 will reduce the selectivity considerably. It does not, and so you pay
 the cost dearly later on.
 

Yes, that makes a lot of sense.  Without some incredibly good cross-column
statistics there is no way it could expect all of the rows to match.  
Thanks for the analysis.

Kris Jurka

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] mis-estimation on data-warehouse aggregate creation

2004-11-16 Thread Kris Jurka


On Tue, 16 Nov 2004, F. Senault wrote:

 Let me guess...  You've never run analyze on your tables ?
 

No, I have.  I mentioned that I did in my email, but you can also tell by
the exactly correct guesses for some other plan steps:

-  Seq Scan on period  (cost=0.00..90.88 rows=3288 width=54) (actual 
time=0.118..12.126 rows=3288 loops=1)

Kris Jurka

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] execute cursor fetch

2004-10-15 Thread Kris Jurka


On Tue, 12 Oct 2004, Stef wrote:

 Pierre-Frédéric Caillaud mentioned :
 = http://www.postgresql.org/docs/7.4/static/jdbc-query.html#AEN24298
 
 My question is :
 Is this only true for postgres versions = 7.4 ?
 
 I see the same section about Setting fetch size to turn cursors on and off
 is not in the postgres 7.3.7 docs. Does this mean 7.3 the JDBC driver
 for postgres  7.4 doesn't support this ?
 

You need the 7.4 JDBC driver, but can run it against a 7.3 (or 7.2) 
database.  Also note the 8.0 JDBC driver can only do this against a 7.4 or 
8.0 database and not older versions.

Kris Jurka

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Normal case or bad query plan?

2004-10-11 Thread Kris Jurka


On Mon, 11 Oct 2004, Gabriele Bartolini wrote:

 -
   Seq Scan on ip2location  (cost=0.00..30490.65 rows=124781 width=8) 
 (actual time=5338.120..40237.283 rows=1 loops=1)
 Filter: ((1040878301::bigint = ip_address_from) AND 
 (1040878301::bigint = ip_address_to))
   Total runtime: 40237.424 ms
 

I believe the problem is that pg's lack of cross-column statistics is 
producing the poor number of rows estimate.  The number of rows mataching 
just the first 1040878301::bigint = ip_address_from condition is 122774 
which is roughtly 10% of the table.  I imagine the query planner 
believes that the other condition alone will match the other 90% of the 
table.  The problem is that it doesn't know that these two ranges'
intersection is actually tiny.  The planner assumes a complete or nearly 
complete overlap so it thinks it will need to fetch 10% of the rows from 
both the index and the heap and chooses a seqscan.

Kris Jurka

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Large # of rows in query extremely slow, not using

2004-09-23 Thread Kris Jurka


On Tue, 14 Sep 2004, Stephen Crowley wrote:

 Problem solved.. I set the fetchSize to a reasonable value instead of
 the default of unlimited  in the PreparedStatement and now the query
 is . After some searching it seeems this is a common problem, would it
 make sense to change the default value to something other than 0 in
 the JDBC driver?

In the JDBC driver, setting the fetch size to a non-zero value means that 
the query will be run using what the frontend/backend protocol calls a 
named statement.  What this means on the backend is that the planner will 
not be able to use the values from the query parameters to generate the 
optimum query plan and must use generic placeholders and create a generic 
plan.  For this reason we have decided not to default to a non-zero 
fetch size.  This is something whose default value could be set by a URL 
parameter if you think that is something that is really required.

Kris Jurka


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Large # of rows in query extremely slow, not using

2004-09-23 Thread Kris Jurka


On Thu, 23 Sep 2004, Stephen Crowley wrote:

 Thanks for the explanation. So what sort of changes need to be made to
 the client/server protocol to fix this problem?

The problem is that there is no way to indicate why you are using a 
particular statement in the extended query protocol.  For the JDBC driver 
there are two potential reasons, streaming a ResultSet and using a server 
prepared statement.  For the streaming as default case you desire there 
needs to be a way to indicate that you don't want to create a generic 
server prepared statement and that this query is really just for one time 
use, so it can generate the best plan possible.

Additionally you can only stream ResultSets that are of type FORWARD_ONLY.  
It would also be nice to be able to specify scrollability and holdability 
when creating a statement and the offset/direction when streaming data 
from a scrollable one.

Kris Jurka

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [JDBC] Cursors performance (was: Re: [PERFORM] Terrible performance

2004-07-09 Thread Kris Jurka


On Fri, 9 Jul 2004, Bill Chandler wrote:

 Thanks to all who have responded.  I now think my
 problem is not related to deleting/recreating indexes.
 Somehow it is related to JDBC cursors.  It appears
 that what is happening is that since I'm using 
 a fetch size of 5000, the command:
 
 FETCH FORWARD 5000 FROM JDBC_CURS_1
 

If the top level node of your execution plan is a sort step, it should 
take essentially no time to retrieve additional rows after the first 
fetch.  The sort step is materializes the results so that future fetches 
simply need to spit this data back to the client.

I would agree with Dave's suggestion to use log_duration and compare the 
values for the first and subsequent fetches.

Kris Jurka


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?

2004-06-07 Thread Kris Jurka


On Sat, 5 Jun 2004, Steve Wampler wrote:

 
 [I want to use copy from JDBC]
 

I made a patch to the driver to support COPY as a PG extension.  The
patch required properly encoded and formatted copy data available
from an InputStream.  Following some feedback from others I began adding
the ability to handle different encodings and the ability to read and
write objects without requiring any knowledge of the copy data format.  I
got hung up on the object read/write part because of some issues with how
type conversions are done in the driver.

At the moment there is a big push being made by Oliver Jowett to get true 
V3 protocol support implemented which is currently my highest priority.  
Getting copy support into the JDBC driver is something I'd like to see for 
7.5, but I couldn't say if that will happen or how complete it may be.  
Depending on your needs perhaps the initial patch is sufficient.

http://archives.postgresql.org/pgsql-jdbc/2003-12/msg00186.php

Kris Jurka


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?

2004-06-07 Thread Kris Jurka


On Mon, 7 Jun 2004, Steve Wampler wrote:

 I do have a little concern about what's happening in the
 back end during the copy - I suspect the entire table is
 locked, which may impact the performance when multiple
 clients are saving entries into the table.  Anyone know
 if that's how COPY works?  (For that matter, would that
 also be true of a transaction consisting of a set of
 inserts?)
 

The table is not locked in either the copy or the insert case.

Kris Jurka

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] planner/optimizer question

2004-04-30 Thread Kris Jurka


On Fri, 30 Apr 2004, Gary Doades wrote:

 Yes, you're right and I have done this just to prove to myself that it
 is the index scan that is the bottleneck. I have some complex SQL that
 executes very quickly with Postgres, similar to MSSQL, but the index
 scans in most of those only touch a few rows for a few loops. It seems
 to be a problem when the index scan is scanning very many rows and for
 each of these it has to go to the table just to find out if the index it
 just looked at is still valid.
 

Another way to speed this up is the TODO item: Use bitmaps to fetch 
heap pages in sequential order  For an indexscan that fetches a number 
of rows those rows may be anywhere in the base table so as each index 
entry is found it fetches the corresponding table row from the heap.  This 
is not ideal because you can be jumping around in the heap and end up 
fetching the same page multiple times because table rows are in the same 
page, but were found in different places in the index.

Kris Jurka

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] severe performance issue with planner (fwd)

2004-03-16 Thread Kris Jurka

I sent this message to the list and although it shows up in the archives,
I did not receive a copy of it through the list, so I'm resending as I
suspect others did not see it either.

-- Forwarded message --
Date: Sat, 13 Mar 2004 22:48:01 -0500 (EST)
From: Kris Jurka [EMAIL PROTECTED]
To: Tom Lane [EMAIL PROTECTED]
Cc: Eric Brown [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: [PERFORM] severe performance issue with planner 

On Thu, 11 Mar 2004, Tom Lane wrote:

 Eric Brown [EMAIL PROTECTED] writes:
  [ planning a 9-table query takes too long ]
 
 See http://www.postgresql.org/docs/7.4/static/explicit-joins.html
 for some useful tips.
 

Is this the best answer we've got?  For me with an empty table this query 
takes 4 seconds to plan, is that the expected planning time?  I know I've 
got nine table queries that don't take that long.

Setting geqo_threshold less than 9, it takes 1 second to plan.  Does this 
indicate that geqo_threshold is set too high, or is it a tradeoff between 
planning time and plan quality?  If the planning time is so high because 
the are a large number of possible join orders, should geqo_threhold be 
based on the number of possible plans somehow instead of the number of 
tables involved?

Kris Jurka


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])