[PERFORM] Relation of cpu_*_costs?

2004-06-07 Thread SZŰCS Gábor
Dear Gurus,

Please feel free to show me to the archives if my question has already been
answered.

Last week I fumbled with CPU_TUPLE_COST and revealed that 4 out of 4 tested
queries improved by 10-60% if I changed it from 0.01 (default) to 0.40
(ugh). Setting it higher did not bring any improvement.

%--- cut here ---%
QUESTION1: is there a (theoretical or practical) relation between this one
and the other cpu costs? Should I also increase those values by the same
rate and find a balance that way?

As far as I can guess, there should be a linear relation, i.e.
cpu_tuple_cost:cpu_index_tuple_cost:cpu_operator_cost should be a constant
ratio, but then again, I suspect there is a cause that they have separate
entries in the config file ;)


%--- cut here ---%
The queries were, or contained, something like:

  SELECT s.qty FROM a, s WHERE a.id = s.a_id AND a.b_id = 1234;

where
* "a" and "s" are in 1:N relation,
* "b" and "a" are in 1:N relation,
* a.id is pkey in "a" and b.id is pkey in "b".

These queries usually return up to 6-10% of the tuples in s (about 16k of
220k) and the planner chose seq scans on s. Disabling seq scan and some
other things finally brought up a plan containing index scans that improved
two queries. (I tested the other two after I found out the solution of
these, to see if they improve or get worse)

Also noticed that the largest gain was from the smallest change on
cpu_tuple_cost: the query with the largest improvement (to 32% of orig time)
chose the better plan from 0.03, but the other one improved (to 79%) only if
set cpu_tuple_cost to 0.40 or higher.


%--- cut here ---%
QUESTION2: am I right setting cpu_tuple_cost, or may there be another cause
of poor plan selection? Also tried lowering random_page_cost, but even 1.0
didn't yield any improvement.


%--- cut here ---%
CONFIGURATION: PostgreSQL 7.3.4, IBM Xeon 2x2.4GHz HT, 5x36GB 10krpm HW
RAID-5.

We found out quite early that random page cost is quite low (now we have it
at 1.5-- maybe it's still to high) and it's true that tasks that require raw
cpu power aren't very much faster than PIII-800. Unfortunately I can't test
the same hw on 7.4 yet, since it's a production server.

TIA,
G.
%--- cut here ---%
\end


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?

2004-06-07 Thread Markus Schaber
Hi, Steve,

Here are the results of some benchmarks we did inserting 30k rows into a
table, using "time psql -f blubb.sql -q dbname":

Filekingfisher  skate
30kinsert.sql   39.359s762r/s   335.024s   90r/s
30kcommonce.sql 11.402s   2631r/s 7.086s 4233r/s
30kwithids.sql  10.138s   2959r/s 6.936s 4325r/s
30kprepare.sql   8.173s   3670r/s 5.189s 5781r/s
30kdump.sql  1.286s  23328r/s 0.785s38216r/s
30kdumpseq.sql   1.498s  20026r/s 0.927s32362r/s


Kingfisher is the single processor machine I mentioned yesterday night,
skate the SMP machine.

The table has five rows (bigint, bigint, double, double, timestamp
without time zone). The first of them has a "default nextval('sequence'
::text)" definition, and there are no further constraints or indices.

The 30kinsert.sql uses simple insert statements with autocommit on, and
we insert all but the first column which is filled by the default
sequence. With this test, kingfisher seems to have an irrealistic high
value of commits (syncs) per second (see what I wrote yesterday) [1],
skate has a more realistic value.

30kcommonce.sql, as suspected, gives a rather high boost by
encapsulating all into a single commit statement.

30kwithids gives a small boost by inserting pre-calculated sequence
numbers, so it seems not worth the effort to move this logic into the
application.

30kprepare prepares the insert statement, and then issues 30k EXECUTE
statements within one transaction, the speedup is noticeable.

30kdump simply inserts the 30k rows as a dump via COPY FROM STDIN. (as
with 30kwithids, the first column is contained in the insert data, so
the default value sequence is not used). Naturally, this is by far the
fastest method.

30kdumpseq.sql uses COPY, too, but omits the first column and such
utilizes the sequence generation again. This gives a noticeable 15%
slowdown, but seems to be still fast enough for our purposes. Sadly, it
is not available within jdbc.

Thanks for your patience.

Footnotes: 
[1] We suspect this to be some strange interaction between ide,
cryptoloop and ext3fs, so that the sync() call somehow does not really
wait for the data to be physically written to the disk. (I really can't
imagine a crypto-looped notebook harddisk to do more syncs/second than a
SCSI-Based RAID in a server machine. We did some small benches on the
sync() / fsync() calls that seem to prove this conclusion.)

-- 
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:[EMAIL PROTECTED] | www.logi-track.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?

2004-06-07 Thread Kris Jurka


On Sat, 5 Jun 2004, Steve Wampler wrote:

> 
> [I want to use copy from JDBC]
> 

I made a patch to the driver to support COPY as a PG extension.  The
patch required properly encoded and formatted copy data available
from an InputStream.  Following some feedback from others I began adding
the ability to handle different encodings and the ability to read and
write objects without requiring any knowledge of the copy data format.  I
got hung up on the object read/write part because of some issues with how
type conversions are done in the driver.

At the moment there is a big push being made by Oliver Jowett to get true 
V3 protocol support implemented which is currently my highest priority.  
Getting copy support into the JDBC driver is something I'd like to see for 
7.5, but I couldn't say if that will happen or how complete it may be.  
Depending on your needs perhaps the initial patch is sufficient.

http://archives.postgresql.org/pgsql-jdbc/2003-12/msg00186.php

Kris Jurka


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Postgres function use makes machine crash.

2004-06-07 Thread Stef
Hi all,

I've been dealing with a problem for the past two days
where a certain sql statement works 2 out of 5 times, and
the other 3 times, it causes the machine (quad Xeon 2.8GHz
+ 792543232 bytes mem, linux kernel  2.4.26-custom, pg ver 7.3.4)
to slow down, and finally grind to a halt. It looks like postgres
gets itself into an insane loop, because no matter how much
shared memory I give it, it uses it all, and then
the kernel starts swapping. 

I'm pretty sure it's not the kernel, because I've tried four different 
2.4.2* stable kernels, and the same happens.

I've attached the query, and the functions used inside the query,
as well as the table structure and an explain. (I haven't been
able to get explain analyze)

It seems that when I replace the functions used in the query,
with the actual values returned by them (one date in each case),
the query runs in 10 seconds.

I did vacuum analyze, and reindex seemed to  work at one
stage, but now it doesn't anymore. 

Is there some limitation in using functions that I do not know 
about, or is it a bug? 

(It seems to be hanging on the max_fpp()
function call from inside the fpp_max_ms() function.)

Please help.

Kind Regards
Stefan


query.sql
Description: Binary data
=# EXPLAIN SELECTgroup_code::text  AS group_code,
-#   sku::text AS sku,
-#   stktype_code::varchar(2)  AS stktype_code,
-#   brn_code::textAS brn_code,
-#   SUM(overdue)::int4AS overdue,
-#   SUM(current)::int4AS current,
-#   SUM(future)::int4 AS future
-# FROM (
(# SELECTgroup_code,
(#   sku,
(#   stktype_code,
(#   brn_code, 
(#   CASE WHEN to_date <= max_fpp_ms() THEN
(#  SUM(out_qty)
(#   ELSE  0  
(#   END AS overdue,
(#   CASE WHEN to_date > max_fpp_ms()
(#   AND to_date <= max_fpp_me() THEN
(#  SUM(out_qty)
(#   ELSE 0
(#   END AS current,
(#   CASE WHEN to_date > max_fpp_me() THEN
(#  SUM(out_qty)
(#   ELSE 0
(#   END AS future
(# FROM gir_outstanding
(# GROUP BY  group_code,
(#   sku,
(#   stktype_code,
(#   brn_code,
(#   to_date
(# ) AS sub
-# GROUP BY  group_code, 
-#   sku,
-#   stktype_code,
-#   brn_code
-# ;
   QUERY PLAN  
  
-
 Aggregate  (cost=15880.41..16055.62 rows=876 width=44)
   ->  Group  (cost=15880.41..15989.92 rows=8761 width=44)
 ->  Sort  (cost=15880.41..15902.31 rows=8761 width=44)
   Sort Key: group_code, sku, stktype_code, brn_code
   ->  Subquery Scan sub  (cost=13335.57..15306.72 rows=8761 width=44)
 ->  Aggregate  (cost=13335.57..15306.72 rows=8761 width=44)
   ->  Group  (cost=13335.57..14649.67 rows=87607 width=44)
 ->  Sort  (cost=13335.57..13554.58 rows=87607 
width=44)
   Sort Key: group_code, sku, stktype_code, 
brn_code, to_date
   ->  Seq Scan on gir_outstanding  
(cost=0.00..4687.07 rows=87607 width=44)
(10 rows)



functions.sql
Description: Binary data
=# \d gir_outstanding
  Table "public.gir_outstanding"
 Column |Type | Modifiers 
+-+---
 supp_code  | text| 
 supp_name  | text| 
 supp_brn   | text| 
 ord_no | text| 
 due_date   | timestamp without time zone | 
 to_date| timestamp without time zone | 
 group_code | text| 
 brn_code   | text| 
 desc_short | text| 
 cluster_brn| text| 
 country_code   | text| 
 req_doc_no | integer | 
 ops_code   | text| 
 sku| text| 
 std_descr  | text| 
 acde_code  | text| 
 req_qty| double precision| 
 grv_qty| double precision| 
 skul_qty   | double precision| 
 pref_date  | timestamp without time zone | 
 skul_grv_qty   | double precision| 
 out_qty| double precision| 
 skul_or

Re: [PERFORM] Slow join using network address function

2004-06-07 Thread Josh Berkus
Eric,

>  Nested Loop
>  (cost=189.00..27359887.76 rows=607947200 width=22)
>Join Filter: ("outer".address <<= "inner".address)
>->  Seq Scan on clients c
>(cost=0.00..2074.76 rows=102176 width=11)
>->  Materialize
>(cost=189.00..308.00 rows=11900 width=11)
>  ->  Seq Scan on clients_commercial cc
>  (cost=0.00..189.00 rows=11900 width=11)

To help you, we need EXPLAIN ANALYZE, not just EXPLAIN.   Thanks!

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Relation of cpu_*_costs?

2004-06-07 Thread Tom Lane
"=?iso-8859-2?B?U1rbQ1MgR+Fib3I=?=" <[EMAIL PROTECTED]> writes:
> Last week I fumbled with CPU_TUPLE_COST and revealed that 4 out of 4 tested
> queries improved by 10-60% if I changed it from 0.01 (default) to 0.40
> (ugh). Setting it higher did not bring any improvement.

That's pretty hard to believe; particularly on modern machines, I'd
think that moving it down would make more sense than moving it up.
You're essentially asserting that the CPU time to process one tuple
is almost half of the time needed to bring a page in from disk.

I suspect that your test cases were toy cases small enough to be
fully cached and thus not incur any actual I/O ...

> [ trying to get a nestloop indexscan plan to be generated ]

I believe that the planner's cost model for nestloops with inner
indexscan is wrong: it costs each inner iteration independently, when
in fact there should be some savings, because at least the topmost
levels of the index will soon be fully cached.  However, when I tried
to work out a proper model of this effect, I ended up with equations
that gave higher indexscan costs than what's in there now :-(.  So that
didn't seem like it would make anyone happy.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Postgres function use makes machine crash.

2004-06-07 Thread Tom Lane
Stef <[EMAIL PROTECTED]> writes:
> I've been dealing with a problem for the past two days
> where a certain sql statement works 2 out of 5 times, and
> the other 3 times, it causes the machine (quad Xeon 2.8GHz
> + 792543232 bytes mem, linux kernel  2.4.26-custom, pg ver 7.3.4)
> to slow down, and finally grind to a halt.

IIRC, PG prior to 7.4 had some problems with memory leaks in repeated
execution of SQL-language functions ... and your query sure looks like
it's going to be doing a lot of repeated execution of those functions.

Please try it on 7.4.2 and see if you still have a problem.

It seems somewhat interesting that you see the problem only sometimes
and not every time, but there's not much point in investigating further
if it turns out the problem is already fixed.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Postgres function use makes machine crash.

2004-06-07 Thread Stef
Tom Lane mentioned :
=> Please try it on 7.4.2 and see if you still have a problem.

Will do, and I'll post the results

Thanks!

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] PREPAREing statements versus compiling PLPGSQL

2004-06-07 Thread Mark Aufflick
Hi all,
I am optimizing some code that does a lot of iterative selects and 
inserts within loops. Because of the exception handling limitations in 
postgres and with no ability to twiddle autocommit, just about every 
operation is standalone.

over 5000 odd lines this gets very slow (5-10 minutes including processing).
In seeking to speed it up I am PREPARing the most common inserts and 
selects. I have a few operations already inside plpgsql functions. 
EXECUTE means something different within a plpgsql funtion, so I am 
wondering if there is a way to execute a pre-prepared query inside a 
function.

Or is this even necessary - are queries within plpgsql functions 
automatically prepared when the function is first compiled? On a similar 
note, is there any benefit in PREPAREing a select from a plpgsql function?

Or does anyone have any smart ways to turn off autocommit? (I have 
already played with commit_delay and commit_siblings).

My empirical testing has proven inconclusive (other than turning off 
fsync which makes a huge difference, but not possible on the live 
system, or using a fat copmaq raid card).

Thanks for any help,
Mark.
--
Mark Aufflick
e: [EMAIL PROTECTED]
w: www.pumptheory.com (business)
w: mark.aufflick.com  (personal)
p: +61 438 700 647
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] pl/pgsql and Transaction Isolation

2004-06-07 Thread Marcus Whitney
Hello,

  I have an instance where I have a series of pl/pgsql calls, that report stat 
results to a common table.  When other queries try to hit the stat table 
(with DML commands; SELECT, INSERT, UPDATE, DELETE etc.) they are forced to 
wait in a queue until the pl/pgsql has finished executing.  

will:

 SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; 

before these DML queries eliminate the locking?


-- 
marcus whitney

chief architect : cold feet creative

www.coldfeetcreative.com

800.595.4401
 


cold feet presents emma

email marketing for discriminating

organizations everywhere

visit www.myemma.com

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Pl/Pgsql Functions running simultaneously

2004-06-07 Thread Marcus Whitney
Thanks for your reply.  My comments are below.

On Friday 04 June 2004 16:39, you wrote:
> Uh... I don't think this is necessarily the wrong list, sometimes people
> don't have much to chime in. You could try reposting to -sql or -general
> I suppose.

I'll try one of those.

>
> As for my take on your questions, I wasn't exactly clear on what the
> problem is. If its just that things seem slow, make sure you have done
> the appropriate vacuum/analyze/reindex tech and then try adding some
> debug info to the function to determine where in the function it is
> slowing down.

Yeah, I do a fair amount of vacuum/analyze , but I am unclear as to when I 
should run REINDEX.  Is their a way to tell that indexes have become corrupt, 
or need to be reindexed?

>
> queries inside plpgsql functions will take locks as needed, but they are
> no different than regular statements, just keep in mind that the queries
> inside the function will work like an implicit transaction.

I've noticed.  Thanks for the info.

>
> Robert Treat
>
> On Thu, 2004-06-03 at 17:38, Marcus Whitney wrote:
> > Am I on the wrong list to ask this question, or does this list usually
> > have low activity?  Just asking because I am new and I need to know where
> > to ask this question.  Thanks.
> >
> > On Wednesday 02 June 2004 16:08, Marcus Whitney wrote:
> > > Hello all,
> > >
> > >   I have an import function that I have been working on for some time
> > > now, and it performed well up until recently.  It is doing a lot, and
> > > because the queries are not cached, I am not sure if that is what the
> > > problem is. If a function takes a while, does it lock any of the tables
> > > it is accessing, even for SELECT?
> > >
> > > Below is the bulk of the function:
> > >
> > > -- set sql statement variables
> > > create_import_file_sql := ''COPY '' || container_table || ''
> > > ('' || filtered_container_columns  || '') TO '' ||
> > > quote_literal(formatted_import_file) || '' WITH NULL AS '' ||
> > > null_single_quotes;
> > > upload_to_import_table_sql := ''COPY '' || import_table || ''
> > > (''
> > >
> > > || field_names || '') FROM '' || quote_literal(formatted_import_file)
> > > || || ''
> > >
> > > WITH NULL AS '' ||  null_single_quotes;
> > > clean_personalization_fields_sql := ''UPDATE '' || import_table
> > > || '' SET emma_member_email = btrim(emma_member_email, '' ||
> > > quote_literal(quoted_single_quote) || '') , emma_member_name_first =
> > > btrim(emma_member_name_first, '' || quote_literal(quoted_single_quote)
> > > || '') ,   emma_member_name_last = btrim(emma_member_name_last, '' ||
> > > quote_literal(quoted_single_quote) || '') ;'';
> > > clean_personalization_fields_sql2 := ''UPDATE '' ||
> > > import_table || '' SET emma_member_email = btrim(emma_member_email) ,
> > > emma_member_name_first = btrim(emma_member_name_first) ,
> > > emma_member_name_last =
> > > btrim(emma_member_name_last) ;'';
> > > set_account_id_sql := ''UPDATE '' || import_table || '' SET
> > > emma_account_id = '' || account_id;
> > > set_default_active_status_sql := ''UPDATE '' || import_table ||
> > > '' SET emma_member_status_id = 1'';
> > > set_errors_for_null_email_sql := ''UPDATE '' || import_table ||
> > > '' SET emma_member_status_id = 2 WHERE emma_member_email IS NULL'';
> > > record_null_email_count_sql := ''UPDATE '' || import_history_table
> > >
> > > || '' SET emma_import_null_email_count = (SELECT COUNT(*) FROM '' ||
> > >
> > > import_table || '' WHERE emma_member_email IS NULL) WHERE
> > > emma_import_history_id ='' || import_history_id;
> > > set_errors_for_invalid_email_sql := ''UPDATE '' || import_table
> > > || '' SET emma_member_status_id = 2  WHERE emma_member_email !~* '' ||
> > > email_regex; record_invalid_email_count_sql := ''UPDATE '' ||
> > > import_history_table
> > >
> > > || ''  SET emma_import_invalid_email_count = ( SELECT COUNT(*) FROM ''
> > > || ||
> > >
> > > import_table || ''  WHERE emma_member_email !~* '' || email_regex || ''
> > > ) WHERE emma_import_history_id ='' || import_history_id;
> > > get_dupes_in_import_sql := ''SELECT emma_member_email,
> > > emma_member_status_id FROM '' || import_table || '' GROUP BY
> > > emma_member_email, emma_member_status_id having count(*) > 1'';
> > > insert_dupes_sql := ''INSERT  INTO '' || dupe_table || ''
> > > SELECT * FROM '' || import_table || '' WHERE LOWER(emma_member_email) =
> > > LOWER('' || member_table || ''.emma_member_email)'';
> > > record_table_dupe_count_sql := ''UPDATE '' ||
> > > import_history_table
> > >
> > > || '' SET emma_import_table_dupe_email_count = (SELECT COUNT(*) FROM ''
> > > || ||
> > >
> > > import_table || '' WHERE emma_member_email = LOWER('' || member_table
> > > || ''.emma_member_email)) WHERE emma_import_history_id ='' ||
> > > import_history_id; remove_dupes_from_import_table_sql := ''DELETE FROM
> > > ''
> > >
> > > || import_table
> > > ||
> > > || ''

Re: [PERFORM] Relation of cpu_*_costs?

2004-06-07 Thread SZŰCS Gábor
Dear Tom,

Thanks for your response.

- Original Message - 
From: "Tom Lane" <[EMAIL PROTECTED]>
Sent: Monday, June 07, 2004 3:51 PM


> That's pretty hard to believe; particularly on modern machines, I'd
> think that moving it down would make more sense than moving it up.
> You're essentially asserting that the CPU time to process one tuple
> is almost half of the time needed to bring a page in from disk.

That is exactly what I had in mind. We found that 5x10krpm HW RAID 5 array
blazing fast, while we were really disappointed about CPU. E.g.
* tar'ing 600MB took seconds; gzip'ing it took minutes.
* initdb ran so fast that I didn't have time to hit Ctrl+C because
  I forgot a switch ;)
* dumping the DB in or out was far faster than adddepend between 7.2 and 7.3
* iirc index scans returning ~26k rows of ~64k were faster than seq scan.
  (most suspicious case of disk cache)

But whatever is the case with my hardware -- could you tell me something
(even a search keyword ;) ) about my theoretical question: i.e. relation of
cpu_*_costs?

> I suspect that your test cases were toy cases small enough to be
> fully cached and thus not incur any actual I/O ...

Dunno. The server has 1GB RAM; full DB is ~100MB; largest query was ~7k
which moved at least 2 tables of >200k rows and several smaller ones. If it
is a "toy case" for such hw, I humbly accept your opinion.

BTW its runtime improved from 53 to 48 sec -- all due to changing cpu tuple
cost. I ran the query at different costs, in fast succession:

runcostsec
 #10.0153
 #20.4 50
 #31.0 48
 #41.0 48
 #50.4 48
 #60.0153

For the second result, I'd say disk cache, yes-- but what about the last
result? It's all the same as the first one. Must have been some plan change
(I can send the exp-ana results if you wish)

G.
%--- cut here ---%
\end


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] is it possible to for the planner to optimize this form?

2004-06-07 Thread Merlin Moncure
Right now, I am having trouble getting the planner to optimize queries
in the form of 

select t.key, t.field from t a
where 
(
select count(*) from t b
where b.field > a.field
) = k

The subplan (either index or seq. scan) executes once for each row in t,
which of course takes forever.

This query is a way of achieving LIMIT type results (substitute n-1
desired rows for k) using standard SQL, which is desirable in some
circumstances.  Is it theoretically possible for this to be optimized?

Merlin


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?

2004-06-07 Thread Steve Wampler
On Mon, 2004-06-07 at 02:26, Kris Jurka wrote:
> On Sat, 5 Jun 2004, Steve Wampler wrote:
> 
> > 
> > [I want to use copy from JDBC]
> > 
> 
> I made a patch to the driver to support COPY as a PG extension.
...
> http://archives.postgresql.org/pgsql-jdbc/2003-12/msg00186.php

Thanks Kris - that patch worked beautifully and bumped the
insert rate from ~1000 entries/second to ~9000 e/s in my
test code.

Here's hoping it makes it into 7.5.

I do have a little concern about what's happening in the
back end during the copy - I suspect the entire table is
locked, which may impact the performance when multiple
clients are saving entries into the table.  Anyone know
if that's how COPY works?  (For that matter, would that
also be true of a transaction consisting of a set of
inserts?)

Thanks again!
-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?

2004-06-07 Thread Steve Wampler
On Mon, 2004-06-07 at 10:40, Steve Wampler wrote:

> Thanks Kris - that patch worked beautifully and bumped the
> insert rate from ~1000 entries/second to ~9000 e/s in my
> test code.

As a followup - that 9000 e/s becomes ~21,000 e/s if I don't
have the java code also dump the message to standard output!

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] seq scan woes

2004-06-07 Thread Dan Langille
A production system has had a query recently degrade in performance.  
What once took < 1s now takes over 1s.  I have tracked down the 
problem to a working example.

Compare  http://rafb.net/paste/results/itZIx891.html
with   http://rafb.net/paste/results/fbUTNF95.html

The first shows the query as is, without much change (actually, this 
query is nested within a larger query, but it demonstrates the 
problem).  The query time is about 1 second.

In the second URL, a "SET ENABLE_SEQSCAN TO OFF;" is done, and the 
time drops to 151ms, which is acceptable.

What I don't understand is why the ports table is scanned in the 
first place.  Clues please?
-- 
Dan Langille : http://www.langille.org/
BSDCan - http://www.bsdcan.org/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] [PERFORMANCE] slow small delete on large table

2004-06-07 Thread Christopher Browne
[EMAIL PROTECTED] ("Ed L.") wrote:
> A 7.3.4 question...
>
> I want to "expire" some data after 90 days, but not delete too
> much at once so as not to overwhelm a system with precariously
> balanced disk I/O and on a table with millions of rows.  If I 
> could say it the way I think for a simple example, it'd be
> like this:
>
>   delete from mytable
>   where posteddatetime < now() - '90 days'
>   limit 100;
>
> Of course, that's not legal 7.3.4 syntax.  These are both too
> slow due to sequential scan of table:
>
>   delete from mytable where key in (
>   select key
>   from mytable
>   where posteddatetime < now() - '90 days'
>   limit 100);
> or 
>   delete from mytable where exists (
>   select m.key
>   from mytable m
>   where m.key = mytable.key
> and m.posteddatetime < now() - '90 days'
>   limit 100);
>
> Tried to use a cursor, but couldn't figure out the syntax
> for select-for-delete yet, or find appropriate example on
> google.  Any clues?

I'm hoping that there's an index on posteddatetime, right?

There are several approaches that would be quite sensible to consider...

1.  Delete records as often as possible, so that the number deleted at
any given time stays small.

2.  Or find an hour at which the system isn't busy, and blow through a
lot of them then.

3.  Open a cursor querying records in your acceptable range, e.g.

  declare nukem cursor for select key from mytable where posteddate <
  now() - '90 days'::interval;

  Fetch 100 entries from the cursor, and submit, across another
  connection, delete requests for the 100 entries, all as one
  transaction, which you commit.

  Sleep a bit, and fetch another 100.

  Note that the cursor will draw groups of 100 entries into memory;
  it's good to immediately delete them, as they'll be in buffers.
  Keeping the number of rows deleted small, and sleeping a bit, means
  you're not trashing buffers too badly.  The query doesn't enforce
  any particular order on things; it effect chews out old entries in
  any order the query finds them.  If you can't keep up with
  insertions, there could be rather old entries that would linger
  around...

  This parallels the "sleepy vacuum" that takes a similar strategy to
  keeping vacuums from destroying performance.

4.  Rotor tables.

Have "mytable" be a view on a sequence of tables.

create view mytable as 
  select * from mytable1 
   union all 
  select * from mytable2
   union all 
  select * from mytable3 
   union all 
  select * from mytable4 
   union all 
  select * from mytable5 
   union all 
  select * from mytable6 
   union all 
  select * from mytable7 
   union all 
  select * from mytable8
   union all 
  select * from mytable9 
   union all
  select * from mytable10

A rule can choose an appropriate table from the 9 to _actually_ insert
into.

Every 3 days, you truncate the eldest table and rotate on to insert
into the next table.  

That will take mere moments, which is real helpful to save you I/O on
the deletes.

There is an unfortunate other problem with this; joins against mytable
are pretty bad, and self-joins effectively turn into a union all
across 100 joins.  (Table 1 against 1-10, Table 2 against 1-10, and so
forth...)

For this not to suck rather incredibly requires fairly carefully
structuring queries on the table.  That may or may not be compatible
with your needs...
-- 
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/x.html
A Linux  machine!  because  a 486  is a terrible  thing to  waste!  
-- <[EMAIL PROTECTED]> Joe Sloan

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] seq scan woes

2004-06-07 Thread Dan Langille
On 7 Jun 2004 at 16:00, Rod Taylor wrote:

> On Mon, 2004-06-07 at 15:45, Dan Langille wrote:
> > A production system has had a query recently degrade in performance.  
> > What once took < 1s now takes over 1s.  I have tracked down the 
> > problem to a working example.
> 
> What changes have you made to postgresql.conf?

Nothing recently (ie. past few months). Nothing at all really.  
Perhaps I need to start tuning that.

> Could you send explain analyse again with SEQ_SCAN enabled but with
> nested loops disabled?

See http://rafb.net/paste/results/zpJEvb28.html

13s

> Off the cuff? I might hazard a guess that effective_cache is too low or
> random_page_cost is a touch too high. Probably the former.

I grep'd postgresql.conf:

#effective_cache_size = 1000# typically 8KB each
#random_page_cost = 4   # units are one sequential page fetch cost

NOTE: both above are commented out.

Thank you
-- 
Dan Langille : http://www.langille.org/
BSDCan - http://www.bsdcan.org/


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] is it possible to for the planner to optimize this form?

2004-06-07 Thread Josh Berkus
Merlin,

> select t.key, t.field from t a
> where
> (
> select count(*) from t b
> where b.field > a.field
> ) = k
>
> The subplan (either index or seq. scan) executes once for each row in t,
> which of course takes forever.
>
> This query is a way of achieving LIMIT type results (substitute n-1
> desired rows for k) using standard SQL, which is desirable in some
> circumstances.  Is it theoretically possible for this to be optimized?

I don't think so, no.   PostgreSQL does have some issues using indexes for 
count() queires which makes the situation worse.   However, with the query 
you presented, I don't see any way around the planner executing the subquery 
once for every row in t.

Except, of course, for some kind of scheme involving materialized views, if 
you don't need up-to-the minute data.   In that case, you could store in a 
table the count(*)s of t for each threshold value of b.field.  But, 
dynamically, that would be even slower.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] seq scan woes

2004-06-07 Thread Dan Langille
On 7 Jun 2004 at 16:38, Rod Taylor wrote:

> On Mon, 2004-06-07 at 16:12, Dan Langille wrote:
> > On 7 Jun 2004 at 16:00, Rod Taylor wrote:
> > 
> > > On Mon, 2004-06-07 at 15:45, Dan Langille wrote:
> > > > A production system has had a query recently degrade in performance.  
> > > > What once took < 1s now takes over 1s.  I have tracked down the 
> > > > problem to a working example.
> > > 
> > > What changes have you made to postgresql.conf?
> > 
> > Nothing recently (ie. past few months). Nothing at all really.  
> > Perhaps I need to start tuning that.
> > 
> > > Could you send explain analyse again with SEQ_SCAN enabled but with
> > > nested loops disabled?
> > 
> > See http://rafb.net/paste/results/zpJEvb28.html
> 
> This doesn't appear to be the same query as we were shown earlier.

My apologies. I should try to cook dinner and paste at the same time. 
 ;)

http://rafb.net/paste/results/rVr3To35.html is the right query.

> > > Off the cuff? I might hazard a guess that effective_cache is too low or
> > > random_page_cost is a touch too high. Probably the former.
> > 
> > I grep'd postgresql.conf:
> > 
> > #effective_cache_size = 1000# typically 8KB each
> > #random_page_cost = 4   # units are one sequential page fetch cost
> 
> This would be the issue. You haven't told PostgreSQL anything about your
> hardware. The defaults are somewhat modest.
> 
> http://www.postgresql.org/docs/7.4/static/runtime-config.html
> 
> Skim through the run-time configuration parameters that can be set in
> postgresql.conf.
> 
> Pay particular attention to:
>   * shared_buffers (you may be best with 2000 or 4000)
>   * effective_cache_size (set to 50% of ram size if dedicated db
> machine)
>   * random_page_cost (good disks will bring this down to a 2 from a
> 4)

I'll have a play with that and report back.

Thanks.
-- 
Dan Langille : http://www.langille.org/
BSDCan - http://www.bsdcan.org/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?

2004-06-07 Thread Kris Jurka


On Mon, 7 Jun 2004, Steve Wampler wrote:

> I do have a little concern about what's happening in the
> back end during the copy - I suspect the entire table is
> locked, which may impact the performance when multiple
> clients are saving entries into the table.  Anyone know
> if that's how COPY works?  (For that matter, would that
> also be true of a transaction consisting of a set of
> inserts?)
> 

The table is not locked in either the copy or the insert case.

Kris Jurka

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Join slow on "large" tables

2004-06-07 Thread Josué Maldonado
Hello list,
Server is dual Xeon 2.4, 2GBRAM, Postgresql is running on partition:
/dev/sda9  29G  8.9G   20G  31% /home2
/dev/sda9 on /home2 type jfs (rw)
Version()
PostgreSQL 7.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 
20020903 (Red Hat Linux 8.0 3.2-7)

I have a view to join two tables inventory details (pkardex) and 
inventory documents header (pmdoc) this view usually runs pretty slow as 
indicated in the explain analyze, pkardex is 1943465 rows and its size 
aprox 659MB, pmdoc is 1183520 rows and its size is aprox 314MB. The view 
definition is:

SELECT pkd_pk AS kpk, (pkd_stamp)::date AS kfecha, pkd_docto AS kdocto,
pdc_custid)::text || ' '::text) || 
(pdc_custname)::text))::character
varying(50) AS kclpv, pkd_saldo AS ksaldo, pkd_es AS kes, CASE WHEN 
(pkd_es
= 'E'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS kentrada, 
CASE WHEN
(pkd_es = 'S'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS 
ksalida,
pkd_pcode AS kprocode, pkd_price AS kvalor, pdc_tipdoc AS ktipdoc
FROM (pkardex JOIN pmdoc ON ((pmdoc.pdc_pk = pkardex.doctofk)));

Shared memory is:
/root: cat /proc/sys/kernel/shmmax
1073741824
and postgresql.conf have this settings:
tcpip_socket = true
sort_mem = 8190 # min 64, size in KB
vacuum_mem = 262144 # min 1024, size in KB
checkpoint_segments = 10
max_connections = 256
shared_buffers = 32000
effective_cache_size = 16   # typically 8KB each
random_page_cost = 2# units are one sequ
The explain analyze is:
dbmund=# explain analyze select * from vkardex where kprocode='1017';
 Nested Loop  (cost=0.00..32155.66 rows=5831 width=114) (actual 
time=18.223..47983.157 rows=4553 loops=1)
   ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..11292.52 
rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1)
 Index Cond: ((pkd_pcode)::text = '1017'::text)
   ->  Index Scan using pdc_pk_idx on pmdoc  (cost=0.00..3.55 rows=1 
width=50) (actual time=1.659..1.661 rows=1 loops=5049)
 Index Cond: (pmdoc.pdc_pk = "outer".doctofk)
 Total runtime: 47988.067 ms
(6 rows)

Does anyone can help me how to properly tune postgresql to gain some 
speed in such queries, some people have mentioned a RAM increase is 
necesary, about 8GB or more to have postgresql to run  smooth, any 
comment or suggestion. I really appreciate any help.

Regards,
--
Sinceramente,
Josué Maldonado.
"Que se me den seis líneas escritas de puño y letra del hombre más 
honrado del mundo, y hallaré en ellas motivos para hacerle ahorcar." 
--cardenal Richelieu (Cardenal y político francés. 1.585 - 1.642)

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Join slow on "large" tables

2004-06-07 Thread Josh Berkus
Josue'

> ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..11292.52 
> rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1)

Looks to me like there's a problem with index bloat on pkd_pcode_idx.   Try 
REINDEXing it, and if that doesn't help, VACUUM FULL on pkardex.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Join slow on "large" tables

2004-06-07 Thread Scott Marlowe
On Mon, 2004-06-07 at 16:19, Josuà Maldonado wrote:
> Hello list,
> 
> Server is dual Xeon 2.4, 2GBRAM, Postgresql is running on partition:
> /dev/sda9  29G  8.9G   20G  31% /home2
> /dev/sda9 on /home2 type jfs (rw)
> 
> Version()
> PostgreSQL 7.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 
> 20020903 (Red Hat Linux 8.0 3.2-7)
> 
> I have a view to join two tables inventory details (pkardex) and 
> inventory documents header (pmdoc) this view usually runs pretty slow as 
> indicated in the explain analyze, pkardex is 1943465 rows and its size 
> aprox 659MB, pmdoc is 1183520 rows and its size is aprox 314MB. The view 
> definition is:
> 
> SELECT pkd_pk AS kpk, (pkd_stamp)::date AS kfecha, pkd_docto AS kdocto,
>  pdc_custid)::text || ' '::text) || 
> (pdc_custname)::text))::character
>  varying(50) AS kclpv, pkd_saldo AS ksaldo, pkd_es AS kes, CASE WHEN 
> (pkd_es
>  = 'E'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS kentrada, 
> CASE WHEN
>  (pkd_es = 'S'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS 
> ksalida,
>  pkd_pcode AS kprocode, pkd_price AS kvalor, pdc_tipdoc AS ktipdoc
> FROM (pkardex JOIN pmdoc ON ((pmdoc.pdc_pk = pkardex.doctofk)));
> 
> 
> Shared memory is:
> /root: cat /proc/sys/kernel/shmmax
> 1073741824
> 
> and postgresql.conf have this settings:
> tcpip_socket = true
> sort_mem = 8190 # min 64, size in KB
> vacuum_mem = 262144 # min 1024, size in KB
> checkpoint_segments = 10
> max_connections = 256
> shared_buffers = 32000
> effective_cache_size = 16   # typically 8KB each
> random_page_cost = 2# units are one sequ
> 
> The explain analyze is:
> dbmund=# explain analyze select * from vkardex where kprocode='1017';
>   Nested Loop  (cost=0.00..32155.66 rows=5831 width=114) (actual 
> time=18.223..47983.157 rows=4553 loops=1)
> ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..11292.52 
> rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1)
>   Index Cond: ((pkd_pcode)::text = '1017'::text)
> ->  Index Scan using pdc_pk_idx on pmdoc  (cost=0.00..3.55 rows=1 
> width=50) (actual time=1.659..1.661 rows=1 loops=5049)
>   Index Cond: (pmdoc.pdc_pk = "outer".doctofk)
>   Total runtime: 47988.067 ms
> (6 rows)

OK, you have to ask yourself a question here.  Do I have enough memory
to let both postgresql and the kernel to cache this data, or enough
memory for only one.  Then, you pick one and try it out.  But there's
some issues here.  PostgreSQL's shared buffer are not, and should not
generally be thought of as "cache".  A cache's job it to hold the whole
working set, or as much as possible, ready for access.  A buffer's job
is to hold all the data we're tossing around right this second.  Once
we're done with the data, the buffers can and do just drop whatever was
in them.  PostgreSQL does not have caching, in the classical sense. 
that may or may not change.

The kernel, on the other hand, has both cache and buffer.  Ever notice
that a Linux top shows the cache usually being much bigger than the
buffers?  My 512 Meg home box right now has 252968k for cache, and
43276k for buffers.  

Now, you're tossing around enough data to actually maybe have a use for
a huge set of buffers, but this means you'll need to starve your cache
to get enough buffers.  Which means that if one process does this kind
of join, drops connection, and two seconds later, another process
connects and does nearly the same thing, it's likely to have to read it
all from the hard drives again, as it's not in the postgresql buffer,
and not in the kernel cache.

Starting a seperate connection, doing a simple select * from table1;
sekect * from table 2, dropping the result set returned, and staying
connected seems to be enough to get 7.4 to hold onto the data.

PostgreSQL's current buffer management algo is dirt simple.  The ones in
the kernel's cache are quite good.  So you can quickly reach a point
where PostgreSQL is chasing it's tail where the kernel would have done
OK.

Your numbers show that you are tossing 659M and 314M against each other,
but I don't know if you're harvesting the whole set at once, or just a
couple row of each.  Indexing help, or is this always gonna be a big seq
scan of 90% of both tables?

If you are getting the whole thing all the time, and want postgresql to
buffer the whole thing (I recommend against it, although a very few
circumstances seem to support it) you need to have 973M of buffer.  That
would be 124544 or we'll just call it 13.  This high of a number
means you will be getting more than 50% of the RAM for postgreSQL.  At
that point, it seems you might as well go for broke and grab most of it,
~20 or so.

If you're not always mushing the two things against each other, and
you've got other datasets to interact with, index it.

Oh, in your reply you might to include an explain analyze of the query,
and maybe an output of top while the query is runn

Re: [PERFORM] seq scan woes

2004-06-07 Thread Dan Langille
On 7 Jun 2004 at 16:38, Rod Taylor wrote:

> On Mon, 2004-06-07 at 16:12, Dan Langille wrote:
> > I grep'd postgresql.conf:
> > 
> > #effective_cache_size = 1000# typically 8KB each
> > #random_page_cost = 4   # units are one sequential page fetch cost
> 
> This would be the issue. You haven't told PostgreSQL anything about your
> hardware. The defaults are somewhat modest.
> 
> http://www.postgresql.org/docs/7.4/static/runtime-config.html
> 
> Skim through the run-time configuration parameters that can be set in
> postgresql.conf.
> 
> Pay particular attention to:
>   * shared_buffers (you may be best with 2000 or 4000)

I do remember increasing this in the past.  It was now at 1000 and is 
now at 2000.

see http://rafb.net/paste/results/VbXQcZ87.html

>   * effective_cache_size (set to 50% of ram size if dedicated db
> machine)

The machine has 512MB RAM.  effective_cache_size was at 1000.  So 
let's try a 256MB cache. Does that the match a 32000 setting?  I 
tried it.  The query went to 1.5s.  At 8000, the query was 1s.  At 
2000, the query was about 950ms.

This machine is a webserver/database/mail server, but the FreshPorts 
database is by far its biggest task.  

>   * random_page_cost (good disks will bring this down to a 2 from a
> 4)

I've got mine set at 4.  Increasing it to 6 gave me a 1971ms query.  
At 3, it was a 995ms.  Setting it to 2 gave me a 153ms query.

How interesting.

For camparison, I reset shared_buffers and effective_cache_size back 
to their original value (both at 1000).  This gave me a 130-140ms 
query.

The disks in question is:

ad0: 19623MB  [39870/16/63] at ata0-master UDMA100

I guess that might be this disk: 
http://www.harddrives4less.com/ibmdes6020ua2.html

I invite comments upon my findings.

Rod: thanks for the suggestions.




>   
> 
> -- 
> Rod Taylor 
> 
> Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
> PGP Key: http://www.rbt.ca/signature.asc
> 
> 


-- 
Dan Langille : http://www.langille.org/
BSDCan - http://www.bsdcan.org/


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] seq scan woes

2004-06-07 Thread Dan Langille
On 7 Jun 2004 at 18:49, Dan Langille wrote:

> On 7 Jun 2004 at 16:38, Rod Taylor wrote:
> >   * random_page_cost (good disks will bring this down to a 2 from a
> > 4)
> 
> I've got mine set at 4.  Increasing it to 6 gave me a 1971ms query.  
> At 3, it was a 995ms.  Setting it to 2 gave me a 153ms query.
> 
> How interesting.

The explain analyse: http://rafb.net/paste/results/pWhHsL86.html
-- 
Dan Langille : http://www.langille.org/
BSDCan - http://www.bsdcan.org/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Join slow on "large" tables

2004-06-07 Thread Josué Maldonado
Hi Josh and thanks for your response,
El 07/06/2004 4:31 PM, Josh Berkus en su mensaje escribio:
Josue'

   ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..11292.52 
rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1)

Looks to me like there's a problem with index bloat on pkd_pcode_idx.   Try 
REINDEXing it, and if that doesn't help, VACUUM FULL on pkardex.

Recreated the index (drop then create) and did the vacuum full pkardex 
and the behavior seems to be the same:

dbmund=# explain analyze select * from vkardex where kprocode='1013';
 Nested Loop  (cost=0.00..2248.19 rows=403 width=114) (actual 
time=846.318..16030.633 rows=3145 loops=1)
   ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..806.27 
rows=403 width=72) (actual time=0.054..87.393 rows=3544 loops=1)
 Index Cond: ((pkd_pcode)::text = '1013'::text)
   ->  Index Scan using pdc_pk_idx on pmdoc  (cost=0.00..3.55 rows=1 
width=50) (actual time=4.482..4.484 rows=1 loops=3544)
 Index Cond: (pmdoc.pdc_pk = "outer".doctofk)
 Total runtime: 16033.807 ms
(6 rows)

At the time the querie was running top returned:
5:11pm  up  1:28,  3 users,  load average: 0.19, 0.97, 1.41
69 processes: 66 sleeping, 1 running, 2 zombie, 0 stopped
CPU0 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
CPU1 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
CPU2 states:  0.1% user,  0.4% system,  0.0% nice, 98.4% idle
CPU3 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
Mem:  2069596K av, 1477784K used,  591812K free,   0K shrd,2336K 
buff
Swap: 2096440K av,9028K used, 2087412K free 1388372K 
cached

  PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 1225 postgres  17   0  257M 257M  255M S 0.6 12.7   7:14 postmaster
 1978 postgres  11   0  1044 1044   860 R 0.2  0.0   0:00 top
1 root   9   0   472  444   428 S 0.0  0.0   0:04 init
2 root   8   0 00 0 SW0.0  0.0   0:00 keventd
and free returned:
/root: free
 total   used   free sharedbuffers cached
Mem:   20695961477832 591764  0   23201388372
-/+ buffers/cache:  871401982456
Swap:  2096440   90282087412
I'm not a Linux guru, it looks like a memory leak.
--
Sinceramente,
Josué Maldonado.
"Las palabras de aliento después de la censura son como el sol tras el 
aguacero."

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Join slow on "large" tables

2004-06-07 Thread Josh Berkus
Josue'

> dbmund=# explain analyze select * from vkardex where kprocode='1013';
>   Nested Loop  (cost=0.00..2248.19 rows=403 width=114) (actual 
> time=846.318..16030.633 rows=3145 loops=1)
> ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..806.27 
> rows=403 width=72) (actual time=0.054..87.393 rows=3544 loops=1)
>   Index Cond: ((pkd_pcode)::text = '1013'::text)
> ->  Index Scan using pdc_pk_idx on pmdoc  (cost=0.00..3.55 rows=1 
> width=50) (actual time=4.482..4.484 rows=1 loops=3544)
>   Index Cond: (pmdoc.pdc_pk = "outer".doctofk)
>   Total runtime: 16033.807 ms

Huh?   It is not at all the same.   Your index scan is down to 87ms from 
27,000!   And the total query is down to 16seconds from 47 seconds.   Don't 
you consider that an improvement?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Join slow on "large" tables

2004-06-07 Thread Josué Maldonado
Josh,
El 07/06/2004 5:21 PM, Josh Berkus en su mensaje escribio:
Huh?   It is not at all the same.   Your index scan is down to 87ms from 
27,000!   And the total query is down to 16seconds from 47 seconds.   Don't 
you consider that an improvement?
Yes there was an improvement with respect the previus query, but still 
16 seconds is too slow for that query. And usually the query takes more 
than 10 seconds even with small data sets returned.

Thanks,
--
Sinceramente,
Josué Maldonado.
"La cultura es capaz de descifrar los enigmas en que nos envuelve la vida."
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] is it possible to for the planner to optimize this

2004-06-07 Thread Duane Lee - EGOVX
Title: RE: [PERFORM] is it possible to for the planner to optimize this form?





I didn't really look that closely at the problem but have you thought of trying:


select t.key, t.field from t a
    , (select count(*) as cntb from t b
 where b.field > a.field) as dmytbl
where
cntb = k


This is called an inline view or sometimes a nested table.  You would be joining table t to this inline view with the join criteria being "cntb = k" where k is in t.


-Original Message-
From: Josh Berkus [mailto:[EMAIL PROTECTED]]
Sent: Monday, June 07, 2004 1:32 PM
To: Merlin Moncure; [EMAIL PROTECTED]
Subject: Re: [PERFORM] is it possible to for the planner to optimize
this form?



Merlin,


> select t.key, t.field from t a
> where
> (
> select count(*) from t b
> where b.field > a.field
> ) = k
>
> The subplan (either index or seq. scan) executes once for each row in t,
> which of course takes forever.
>
> This query is a way of achieving LIMIT type results (substitute n-1
> desired rows for k) using standard SQL, which is desirable in some
> circumstances.  Is it theoretically possible for this to be optimized?


I don't think so, no.   PostgreSQL does have some issues using indexes for 
count() queires which makes the situation worse.   However, with the query 
you presented, I don't see any way around the planner executing the subquery 
once for every row in t.


Except, of course, for some kind of scheme involving materialized views, if 
you don't need up-to-the minute data.   In that case, you could store in a 
table the count(*)s of t for each threshold value of b.field.  But, 
dynamically, that would be even slower.


-- 
Josh Berkus
Aglio Database Solutions
San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly





Re: [PERFORM] [SQL] Materialized View Summary

2004-06-07 Thread Richard Huxton
On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote:
>
> I've written a summary of my findings on implementing and using
> materialized views in PostgreSQL. I've already deployed eagerly updating
> materialized views on several views in a production environment for a
> company called RedWeek: http://redweek.com/. As a result, some queries
> that were taking longer than 30 seconds to run now run in a fraction of a
> millisecond.
>
> You can view my summary at
> http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

Interesting (and well written) summary. Even if not a "built in" feature, I'm 
sure that plenty of people will find this useful. Make sure it gets linked to 
from techdocs.

If you could identify candidate keys on a view, you could conceivably automate 
the process even more. That's got to be possible in some cases, but I'm not 
sure how difficult it is to do in all cases.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] [PERFORMANCE] slow small delete on large table

2004-06-07 Thread Christopher Browne
After a long battle with technology, [EMAIL PROTECTED] ("Ed L."), an earthling, wrote:
> On Monday February 23 2004 10:23, Tom Lane wrote:
>> "Ed L." <[EMAIL PROTECTED]> writes:
>> Depending on the size of mytable, you might need an "ANALYZE doomed"
>> in there, but I'm suspecting not.  A quick experiment suggests that
>> you'll get a plan with an inner indexscan on mytable.key, which is
>> exactly what you need.
>
> I didn't mention I'd written a trigger to do delete N rows on each new 
> insert (with a delay governor preventing deletion avalanches).  The 
> approach looks a little heavy to be done from within a trigger with the 
> response time I need, but I'll try it.  Cantchajust toss in that "limit N" 
> functionality to delete clauses?  How hard could that be?  ;)

It's nonstandard, which will get you a certain amount of opposition
"for free;" the problem with nonstandard behaviour is that sometimes
the implications haven't been thought out...

>> See also Chris Browne's excellent suggestions nearby, if you are willing
>> to make larger readjustments in your thinking...
>
> I did a search for articles by Chris Browne, didn't see one that
> appeared relevant.  What is the thread subject to which you refer?

It's in the same thread.  I suggested having a daemon running a cursor
(amounting to a slightly more expensive version of Tom's "doomed temp
table" approach), or using "rotor" tables where you could TRUNCATE a
table every few days which would be _really_ cheap...
-- 
output = ("cbbrowne" "@" "cbbrowne.com")
http://cbbrowne.com/info/emacs.html
Expect the unexpected.
-- The Hitchhiker's Guide to the Galaxy, page 7023

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [HACKERS] [SQL] Materialized View Summary

2004-06-07 Thread Robert Treat
On Tue, 2004-02-24 at 12:11, Richard Huxton wrote:
> On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote:
> >
> > I've written a summary of my findings on implementing and using
> > materialized views in PostgreSQL. I've already deployed eagerly updating
> > materialized views on several views in a production environment for a
> > company called RedWeek: http://redweek.com/. As a result, some queries
> > that were taking longer than 30 seconds to run now run in a fraction of a
> > millisecond.
> >
> > You can view my summary at
> > http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html


have you done much concurrency testing on your snapshot views? I
implemented a similar scheme in one of my databases but found problems
when I had concurrent "refresh attempts".  I ended up serializing the
calls view LOCKing, which was ok for my needs, but I thought potentially
problematic in other cases.

> 
> Interesting (and well written) summary. Even if not a "built in" feature, I'm 
> sure that plenty of people will find this useful. Make sure it gets linked to 
> from techdocs.

Done. :-)

> 
> If you could identify candidate keys on a view, you could conceivably automate 
> the process even more. That's got to be possible in some cases, but I'm not 
> sure how difficult it is to do in all cases.
>

it seems somewhere between Joe Conways work work arrays and polymorphic
functions in 7.4 this should be feasible. 

 
Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] Speed up a function?CREATE TABLE readings ( "when" TIMESTAMP DEFAULT timeofday()::timestamp NOT NULL PRIMARY KEY, "barometer" FLOAT DEFAULT NULL,

2004-06-07 Thread Robert Creager

Hey All,

I've implemented a couple of functions ala date_trunc (listed at the bottom).  These
functions are executed every 5 minutes (date_trunc_minute) and every week
(date_trunc_week) across 16 different values.  The problem is that they take way
too long to execute (nearly 7x the 'regular' date_trunc function).  What might
be the best way to fix the problem?  Use a different function language?  Re-write
the functions?  Re-write the queries?  The gist of what I'm doing is filling
schema tables with weather summary information for the time period in question.

Currently I don't have indexes on these tables.  Would indexs on readings."when" 
and minute.barometer."time" be used with date_trunc?  Functional indexes maybe?

CREATE TABLE readings
   (
   "when"  TIMESTAMP DEFAULT timeofday()::timestamp
  NOT NULL PRIMARY KEY,
   "barometer" FLOAT DEFAULT NULL,
   );

CREATE SCHEMA minute;
CREATE TABLE minute.barometer
   (
   "time"   TIMESTAMP NOT NULL,
   min_reading  FLOAT NOT NULL,
   max_reading  FLOAT NOT NULL,
   avg_reading  FLOAT NOT NULL
   );

The "hour" schema is identical to the "minute" schema.

weather=# EXPLAIN ANALYZE
weather-# SELECT p.period, p.min, p.max, p.avg
weather-# FROM (SELECT date_trunc_minute( 'minute'::text, "when" ) AS period,
weather(#  min( barometer ), max( barometer ), avg( barometer )
weather(#   FROM readings
weather(#   WHERE barometer NOTNULL
weather(#   GROUP BY period) AS p
weather-# WHERE p.period
weather-#   NOT IN (SELECT "time" FROM minute.barometer )
weather-#   AND p.period != date_trunc_minute( 'minute'::text, now()::timestamp );
 QUERY 
PLAN  
   
-
---
 Subquery Scan p  (cost=1665.63..2282.47 rows=13708 width=32) (actual 
time=3318.758..3318.758 rows=0 loops=1)
   Filter: (NOT (hashed subplan))
   ->  HashAggregate  (cost=1501.61..1775.76 rows=27415 width=16) (actual 
time=3227.409..3263.367 rows=13918 
loops=1)
 ->  Seq Scan on readings  (cost=0.00..1227.46 rows=27415 width=16) (actual 
time=1.075..3028.673 rows
=69398 loops=1)
   Filter: ((barometer IS NOT NULL) AND (date_trunc_minute('minute'::text, 
"when") <> date_trunc_
minute('minute'::text, (now())::timestamp without time zone)))
   SubPlan
 ->  Seq Scan on barometer  (cost=0.00..144.02 rows=8002 width=8) (actual 
time=0.008..15.406 rows=13918 l
oops=1)
 Total runtime: 3320.146 ms
(8 rows)

weather=# EXPLAIN ANALYZE
weather-# SELECT p.period, p.min, p.max, p.avg
weather-# FROM (SELECT date_trunc( 'hour'::text, "when" ) AS period,
weather(#  min( barometer ), max( barometer ), avg( barometer )
weather(#   FROM readings
weather(#   WHERE barometer NOTNULL
weather(#   GROUP BY period) AS p
weather-# WHERE p.period
weather-#   NOT IN (SELECT "time" FROM hour.barometer )
weather-#   AND p.period != date_trunc( 'hour'::text, now()::timestamp );
QUERY PLAN 
  
 
-
-
 Subquery Scan p  (cost=1524.11..2140.95 rows=13708 width=32) (actual 
time=551.516..551.516 rows=0 loops=1)
   Filter: (NOT (hashed subplan))
   ->  HashAggregate  (cost=1501.61..1775.76 rows=27415 width=16) (actual 
time=544.859..547.605 rows=1173 loo
ps=1)
 ->  Seq Scan on readings  (cost=0.00..1227.46 rows=27415 width=16) (actual 
time=0.596..399.344 rows=
69353 loops=1)
   Filter: ((barometer IS NOT NULL) AND (date_trunc('hour'::text, "when") 
<> date_trunc('hour'::t
ext, (now())::timestamp without time zone)))
   SubPlan
 ->  Seq Scan on barometer  (cost=0.00..20.00 rows=1000 width=8) (actual 
time=0.007..1.268 rows=1173 loop
s=1)
 Total runtime: 552.137 ms


CREATE FUNCTION date_trunc_week( text, timestamp )
RETURNS timestamp AS '
DECLARE
   reading_time ALIAS FOR $2;
   year timestamp;
   dow integer;
   adjust text;
   week text;
BEGIN
   year := date_trunc( ''year''::text, reading_time );
   week := date_part( ''week'', reading_time ) - 1 || '' week'';
   dow := date_part( ''dow'', year );
   -- If the dow is less than Thursday, then the start week is last year
   IF dow <= 4 THEN
  adjust := 1 - dow || '' day'';
   ELSE
  adjust := 8 - dow || '' day'';
   END IF;
   RETURN year + adjust::interval + week::interval;
END;
' LANGUAGE plpgsql IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION date_trunc_minute( text, timestamp )
RETURNS timestam

Re: [PERFORM] pl/pgsql and Transaction Isolation

2004-06-07 Thread Tom Lane
Marcus Whitney <[EMAIL PROTECTED]> writes:
>   I have an instance where I have a series of pl/pgsql calls, that report stat 
> results to a common table.  When other queries try to hit the stat table 
> (with DML commands; SELECT, INSERT, UPDATE, DELETE etc.) they are forced to 
> wait in a queue until the pl/pgsql has finished executing.  

This is quite hard to believe, unless your pl/pgsql is doing something
as unfriendly as LOCKing the table.

Do you want to post a more complete description of your problem?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Join slow on "large" tables

2004-06-07 Thread Tom Lane
=?ISO-8859-1?Q?Josu=E9_Maldonado?= <[EMAIL PROTECTED]> writes:
> Recreated the index (drop then create) and did the vacuum full pkardex 
> and the behavior seems to be the same:

Well, there was a pretty huge improvement in the pkardex scan time,
whether you noticed it or not: 39520.406 to 87.393 msec.  This
definitely suggests that you've been lax about vacuuming this table.

I'm wondering whether pmdoc might not be overdue for vacuuming as
well.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]