Re: [PERFORM] How to speed-up inserts with jdbc

2004-11-11 Thread Edwin Eyan Moragas
On Wed, 10 Nov 2004 14:51:57 +0100, Michael Kleiser [EMAIL PROTECTED] wrote:
Statement  st = con.createStatement();
java.sql.Timestamp datum = new java.sql.Timestamp(new 
 Date().getTime());
Date start = new Date();
System.out.println(start);
for (int i=0; i100; ++i) {
st.executeUpdate(insert into 
 history(uuid,coni,date,direction,partner,type) 
 values('uuid','content','+datum+','dir','partner','type'));
}

how about using PreparedStatment? that's on the java end.
on the pg end, maybe do a BEGIN before the for loop and 
END at the end of the for loop.
-- 
i'm not flying. i'm falling... in style.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] seqscan strikes again

2004-11-11 Thread Gaetano Mendola
Jim C. Nasby wrote:
 I'm wondering if there's any way I can tweak things so that the estimate
 for the query is more accurate (I have run analyze):
Can you post your configuration file ? I'd like to see for example your
settings about: random_page_cost and effective_cache_size.

Regards
Gaetano Mendola
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] How to speed-up inserts with jdbc

2004-11-11 Thread Steinar H. Gunderson
On Thu, Nov 11, 2004 at 04:04:06PM +0800, Edwin Eyan Moragas wrote:
 how about using PreparedStatment? that's on the java end.
 on the pg end, maybe do a BEGIN before the for loop and 
 END at the end of the for loop.

You don't even need a BEGIN and END; his code has a setAutoComit(true)
before the for loop, which just has to be changed to setAutoCommit(false)
(and add an explicit commit() after the for loop, of course).

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] vacuum analyze slows sql query

2004-11-11 Thread Gaetano Mendola
patrick ~ wrote:
--- John Meinel [EMAIL PROTECTED] wrote:

If you are trying to establish existence, we also had a whole thread on 
this. Basically what we found was that adding an ORDER BY clause, helped 
tremendously in getting the planner to switch to an Index scan. You 
might try something like:

SELECT column FROM mytable WHERE column='myval' ORDER BY column LIMIT 1;
There seems to be a big difference between the above statement and:
SELECT column FROM mytable WHERE column='myval' LIMIT 1;

The ORDER BY trick worked beautifully!  I just hope it'll
continue to work consistently in production code.
For sure it will not break the goal: check the existence.

Regards
Gaetano Mendola


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Strange (?) Index behavior?

2004-11-11 Thread Allen Landsidel
Ok, you thought maybe this thread died or got abandoned in the face of
all the senseless trolling and spam going on.. you were wrong.. ;)

I thought though I'd start over trying to explain what's going on. 
I've gone through some dumps, and recreation of the database with some
different filesystem options and whatnot, and starting over fresh
here's the situation.

First, the structure.

CREATE TABLE testtable (
  nid serial UNIQUE NOT NULL,
  sname text NOT NULL,
  iother int4
);

CREATE UNIQUE INDEX idx_sname_unique ON testtable (sname);

-

With the above, the query SELECT sname FROM testtable WHERE sname
LIKE 'A%'; DOES use an index scan on idx_sname_unique -- sometimes. 
Other times, the planner thinks a sequential scan would be better.

The index is large.  There are over 70 million rows in this table. 
The estimated cost and so forth from EXPLAIN on the above query is way
off as well, but I expect that to be the case considering the size of
the table -- perhaps there is a tunable in the statistics gathering
backend ot fix this?

My goal was to obviously make queries of the above type, as well as
more refined ones such as ... LIKE 'AB%'; faster.

This goal in mind, I thought that creating several indexes (36 of
them) would speed things up -- one index per alphanumeric start
character, via..

CREATE INDEX idx_sname_suba ON testtable (sname) WHERE sname LIKE 'A%';
CREATE INDEX idx_sname_subb ON testtable (sname) WHERE sname LIKE 'B%';
...
CREATE INDEX idx_sname_subz ON testtable (sname) WHERE sname LIKE 'Z%';

(also including 0..9)

I've wracked my brain trying to come up with other ways of doing this,
including partitioning the table, and trying the suggestions here such
as substr(1,1) in the index creation instead of creating many
distinct indexes.

None of these seems to speed up the queries enough to make them
acceptable when it comes to runtimes.  My data from before was
somehow in error.. not sure why.  At this point, using one index vs.
the other the runtimes are about the same.

search=# explain analyze
search-# SELECT sname FROM
search-# (SELECT sname FROM testtable WHERE sname LIKE 'A%') AS subq
search-# WHERE sname LIKE 'AA%';
   QUERY
PLAN
-
 Index Scan using sname_a on testtable  (cost=0.00..189.41 rows=47
width=20) (actual time=16.219..547053.251 rows=74612 loops=1)
   Index Cond: ((sname = 'A'::text) AND (sname  'B'::text) AND
(sname = 'AA'::text) AND (sname  'AB'::text))
   Filter: ((sname ~~ 'A%'::text) AND (sname ~~ 'AA%'::text))
 Total runtime: 547454.939 ms
(4 rows)

Time: 547458.216 ms


search=# explain analyze
search-# SELECT sname FROM testtable WHERE sname LIKE 'AA%';
 
QUERY PLAN
---
 Index Scan using sname_unique on testtable  (cost=0.00..34453.74
rows=8620 width=20) (actual time=77.004..537065.079 rows=74612
loops=1)
   Index Cond: ((sname = 'AA'::text) AND (sname  'AB'::text))
   Filter: (sname ~~ 'AA%'::text)
 Total runtime: 537477.737 ms
(4 rows)

Time: 537480.571 ms

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Strange (?) Index behavior?

2004-11-11 Thread Josh Berkus
Allen,

 Ok, you thought maybe this thread died or got abandoned in the face of
 all the senseless trolling and spam going on.. you were wrong.. ;)

 I thought though I'd start over trying to explain what's going on.
 I've gone through some dumps, and recreation of the database with some
 different filesystem options and whatnot, and starting over fresh
 here's the situation.

I can't find the beginning of this thread.  What's your sort_mem?  
Shared_buffers?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Strange (?) Index behavior?

2004-11-11 Thread Allen Landsidel
On Thu, 11 Nov 2004 10:52:43 -0800, Josh Berkus [EMAIL PROTECTED] wrote:
 Allen,
 
  Ok, you thought maybe this thread died or got abandoned in the face of
  all the senseless trolling and spam going on.. you were wrong.. ;)
 
  I thought though I'd start over trying to explain what's going on.
  I've gone through some dumps, and recreation of the database with some
  different filesystem options and whatnot, and starting over fresh
  here's the situation.
 
 I can't find the beginning of this thread.  What's your sort_mem?
 Shared_buffers?

Currently sort_mem is 64MB and shared_buffers is 256MB.

The box is a dual 800 with 2GB physical, running FreeBSD 4.10-STABLE,
single U2W SCSI hdd.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Strange (?) Index behavior?

2004-11-11 Thread Greg Stark
Allen Landsidel [EMAIL PROTECTED] writes:

 QUERY PLAN
 ---
  Index Scan using sname_unique on testtable  (cost=0.00..34453.74
 rows=8620 width=20) (actual time=77.004..537065.079 rows=74612
 loops=1)
Index Cond: ((sname = 'AA'::text) AND (sname  'AB'::text))
Filter: (sname ~~ 'AA%'::text)
  Total runtime: 537477.737 ms
 (4 rows)
 
 Time: 537480.571 ms

Nothing you're going to do to the query is going to come up with a more
effective plan than this. It's using the index after all. It's never going to
be lightning fast because it has to process 75k rows.

However 75k rows shouldn't be taking nearly 10 minutes. It should be taking
about 10 seconds.

The 77ms before finding the first record is a bit suspicious. Have you
vacuumed this table regularly? Try a VACUUM FULL VERBOSE, and send the
results. You might try to REINDEX it as well, though I doubt that would help.

Actually you might consider clustering the table on sname_unique. That would
accomplish the same thing as the VACUUM FULL command and also speed up the
index scan. And the optimizer knows (if you analyze afterwards) it so it
should be more likely to pick the index scan. But currently you have to rerun
cluster periodically.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] How to speed-up inserts with jdbc

2004-11-11 Thread Edwin Eyan Moragas
On Thu, 11 Nov 2004 11:04:18 +0100, Steinar H. Gunderson
[EMAIL PROTECTED] wrote:
 You don't even need a BEGIN and END; his code has a setAutoComit(true)
 before the for loop, which just has to be changed to setAutoCommit(false)
 (and add an explicit commit() after the for loop, of course).

amen. i stand corrected.

-eem

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Strange (?) Index behavior?

2004-11-11 Thread Allen Landsidel
On 11 Nov 2004 15:49:46 -0500, Greg Stark [EMAIL PROTECTED] wrote:
 Allen Landsidel [EMAIL PROTECTED] writes:
 
 
 
  QUERY PLAN
  ---
   Index Scan using sname_unique on testtable  (cost=0.00..34453.74
  rows=8620 width=20) (actual time=77.004..537065.079 rows=74612
  loops=1)
 Index Cond: ((sname = 'AA'::text) AND (sname  'AB'::text))
 Filter: (sname ~~ 'AA%'::text)
   Total runtime: 537477.737 ms
  (4 rows)
 
  Time: 537480.571 ms
 
 Nothing you're going to do to the query is going to come up with a more
 effective plan than this. It's using the index after all. It's never going to
 be lightning fast because it has to process 75k rows.
 
 However 75k rows shouldn't be taking nearly 10 minutes. It should be taking
 about 10 seconds.

That's my feeling as well, I thought the index was to blame because it
will be quite large, possibly large enough to not fit in memory nor be
quickly bursted up.

 The 77ms before finding the first record is a bit suspicious. Have you
 vacuumed this table regularly? Try a VACUUM FULL VERBOSE, and send the
 results. You might try to REINDEX it as well, though I doubt that would help.

This table is *brand spanking new* for lack of a better term.  I have
the data for it in a CSV.  I load the CSV up which takes a bit, then
create the indexes, do a vacuum analyze verbose, and then posted the
results above.  I don't think running vacuum a more times is going to
change things, at least not without tweaking config settings that
affect vacuum. Not a single row has been inserted or altered since the
initial load.. it's just a test.

I can't give vacuum stats right now because the thing is reloading
(again) with different newfs settings -- something I figure I have the
time to fiddle with now, and seldom do at other times.  These numbers
though don't change much between 8K on up to 64K 'cluster' sizes.  I'm
trying it now with 8K page sizes, with 8K minimum fragment sizes. 
Should speed things up a tiny bit but not enough to really affect this
query.

Do you still see a need to have the output from the vacuum?

 Actually you might consider clustering the table on sname_unique. That would
 accomplish the same thing as the VACUUM FULL command and also speed up the
 index scan. And the optimizer knows (if you analyze afterwards) it so it
 should be more likely to pick the index scan. But currently you have to rerun
 cluster periodically.

Clustering is really unworkable in this situation.  It would work now,
in this limited test case, but using it if this were to go into
production is unrealistic.  It would have to happen fairly often since
this table is updated frequently, which will break the clustering
quickly with MVCC.

Running it often.. well.. it has 70M+ rows, and the entire table is
copied, reordered, and rewritten.. so that's a lot of 'scratch space'
needed.  Finally, clustering locks the table..

Something I'd already considered but quickly ruled out because of
these reasons..

More ideas are welcome though. ;)

-Allen

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Strange (?) Index behavior?

2004-11-11 Thread Tom Lane
Allen Landsidel [EMAIL PROTECTED] writes:
 Clustering is really unworkable in this situation.

Nonetheless, please do it in your test scenario, so we can see if it has
any effect or not.

The speed you're getting works out to about 7.2 msec/row, which would be
about right if every single row fetch caused a disk seek, which seems
improbable unless the table is just huge compared to your available RAM.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Strange (?) Index behavior?

2004-11-11 Thread Joshua D. Drake
---
Index Scan using sname_unique on testtable  (cost=0.00..34453.74
rows=8620 width=20) (actual time=77.004..537065.079 rows=74612
loops=1)
  Index Cond: ((sname = 'AA'::text) AND (sname  'AB'::text))
  Filter: (sname ~~ 'AA%'::text)
Total runtime: 537477.737 ms
(4 rows)
Time: 537480.571 ms
Nothing you're going to do to the query is going to come up with a more
effective plan than this. It's using the index after all. It's never going to
be lightning fast because it has to process 75k rows.
However 75k rows shouldn't be taking nearly 10 minutes. It should be taking
about 10 seconds.
I am confused about this statement. I have a table with 1.77 million 
rows that I use gist indexes on (TSearch) and I can pull out of it in 
less than 2 seconds.

Are you saying it should be taking 10 seconds because of the type of 
plan? 10 seconds seems like an awfullong time for this.

Sincerely,
Joshua D. Drake


That's my feeling as well, I thought the index was to blame because it
will be quite large, possibly large enough to not fit in memory nor be
quickly bursted up.

The 77ms before finding the first record is a bit suspicious. Have you
vacuumed this table regularly? Try a VACUUM FULL VERBOSE, and send the
results. You might try to REINDEX it as well, though I doubt that would help.

This table is *brand spanking new* for lack of a better term.  I have
the data for it in a CSV.  I load the CSV up which takes a bit, then
create the indexes, do a vacuum analyze verbose, and then posted the
results above.  I don't think running vacuum a more times is going to
change things, at least not without tweaking config settings that
affect vacuum. Not a single row has been inserted or altered since the
initial load.. it's just a test.
I can't give vacuum stats right now because the thing is reloading
(again) with different newfs settings -- something I figure I have the
time to fiddle with now, and seldom do at other times.  These numbers
though don't change much between 8K on up to 64K 'cluster' sizes.  I'm
trying it now with 8K page sizes, with 8K minimum fragment sizes. 
Should speed things up a tiny bit but not enough to really affect this
query.

Do you still see a need to have the output from the vacuum?

Actually you might consider clustering the table on sname_unique. That would
accomplish the same thing as the VACUUM FULL command and also speed up the
index scan. And the optimizer knows (if you analyze afterwards) it so it
should be more likely to pick the index scan. But currently you have to rerun
cluster periodically.

Clustering is really unworkable in this situation.  It would work now,
in this limited test case, but using it if this were to go into
production is unrealistic.  It would have to happen fairly often since
this table is updated frequently, which will break the clustering
quickly with MVCC.
Running it often.. well.. it has 70M+ rows, and the entire table is
copied, reordered, and rewritten.. so that's a lot of 'scratch space'
needed.  Finally, clustering locks the table..
Something I'd already considered but quickly ruled out because of
these reasons..
More ideas are welcome though. ;)
-Allen
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
begin:vcard
fn:Joshua D. Drake
n:Drake;Joshua D.
org:Command Prompt, Inc.
adr:;;PO Box 215;Cascade Locks;Oregon;97014;USA
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
note:Command Prompt, Inc. is the largest and oldest US based commercial PostgreSQL support provider. We  provide the only commercially viable integrated PostgreSQL replication solution, but also custom programming, and support. We authored  the book Practical PostgreSQL, the procedural language plPHP, and adding trigger capability to plPerl.
x-mozilla-html:FALSE
url:http://www.commandprompt.com/
version:2.1
end:vcard


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly