date:20051025

[PERFORM] Temporary Table

2005-10-25 Thread Christian Paul B. Cosinas









Does Creating Temporary table
in a function and NOT dropping them affects the performance of the database?

 





I choose Polesoft Lockspam to fight spam, and you?
http://www.polesoft.com/refer.html

FW: [PERFORM] Used Memory

2005-10-25 Thread Christian Paul B. Cosinas


Here are the configuration of our database server:
port = 5432
max_connections = 300
superuser_reserved_connections = 10
authentication_timeout = 60 
shared_buffers = 48000   
sort_mem = 32168
sync = false

Do you think this is enough? Or can you recommend a better configuration for
my server?

The server is also running PHP and Apache but wer'e not using it
extensively. For development purpose only. 

The database slow down is occurring most of the time (when the memory free
is low) I don't think it has something to do with vacuum. We only have a
full server vacuum once a day.



I choose Polesoft Lockspam to fight spam, and you?
http://www.polesoft.com/refer.html


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[PERFORM] Why Index is not working on date columns.

2005-10-25 Thread Kishore B

Hi All, 
 
Thank you very much for your help in configuring the database.
 
Can you guys please take a look at the following query and let me know why the index is not considered in the plan?
 
Here is the extract of the condition string of the query that is taking the transaction_date in index condition:
 
where (account.id = search_engine.account_fk) and ( account.status = 't' and account.id = '40288a820726362f0107263c55d3') and (  search_engine.id = 
conversion.search_engine_fk   and conversion.event_type ='daily_spend' and conversion.tactic = 'PPC' and conversion.transaction_date between '2005-01-01 00:00:00' and '2005-10-31 23:59:59') group by 
account.id;
Plan:
"    ->  Index Scan using conversion_all on "conversion"  (cost=0.00..6.02 rows=1 width=98)""  Index Cond: (((tactic)::text = 'PPC'::text) AND ((event_type)::text = 'daily_spend'::text) AND (transaction_date >= '2005-01-01 00:00:00'::timestamp without time zone) AND (transaction_date <= '2005-10-31 23:59:59'::timestamp without time zon (..)"


Here is the extract of the condition string of the query that is not  taking the transaction_date in index condition:
 
where ( account.status = 't' and account.id  = search_engine.account_fk and account.id = '40288a820726362f0107263c55d3' ) and (  search_engine.id = 
conversion.search_engine_fk   and conversion.tactic = 'PPC' and conversion.transaction_date >= '2005-01-01 00:00:00' and conversion.transaction_date <= '2005-10-31 23:59:59'  ) group by account.id
; 
Plan:
"    ->  Index Scan using conv_evnt_tact_trans_date_sefk on "conversion"  (cost=0.00..6.02 rows=1 width=132)""  Index Cond: ((("outer".id)::text = ("conversion".search_engine_fk)::text) AND (("conversion".tactic)::text = 'PPC'::text))"
"  Filter: ((transaction_date >= '2005-01-01 00:00:00'::timestamp without time zone) AND (transaction_date <= '2005-10-31 23:59:59'::timestamp without time zone))"
 
I have the following indexes defined on the columns.
conv_evnt_tact_trans_date_sefk : (search_engine_fk, tactic, event_type, transaction_date);
conv_all : (tactic, event_type, transaction_date);
 
I am really confused when I saw this plan. In both queries, I am using the same columns in the where condition, but the optimizer is taking different indexes in these two cases.
Second, even though, I have the transaction_date column specified in the second instance, why is it not taking the constraint as index condition?
 
Thanks in advance.
 
Thank you,
Kishore.

[PERFORM] Strange planner decision on quite simple select

2005-10-25 Thread Markus Wollny

Hello!

I've got a table BOARD_MESSAGE (message_id int8, thread_id int8, ...)
with pk on message_id and and a non_unique not_null index on thread_id.
A count(*) on BOARD_MESSAGE currently yields a total of 1231171 rows,
the planner estimated a total of 1232530 rows in this table. I've got
pg_autovacuum running on the database and run an additional nightly
VACUUM ANALYZE over it every night.

I've got a few queries of the following type:

select  * 
from PUBLIC.BOARD_MESSAGE 
where THREAD_ID = 3354253 
order byMESSAGE_ID asc 
limit   20 
offset  0; 


There are currently roughly 4500 rows with this thread_id in
BOARD_MESSAGE. Explain-output is like so:

  QUERY PLAN 


-- 
 Limit  (cost=0.00..3927.22 rows=20 width=1148) 
   ->  Index Scan using pk_board_message on board_message
(cost=0.00..1100800.55 rows=5606 width=1148) 
 Filter: (thread_id = 3354253) 
(3 rows) 

I didn't have the patience to actually complete an explain analyze on
that one - I cancelled the query on several attempts after more than 40
minutes runtime. Now I fiddled a little with this statement and tried
nudging the planner in the right direction like so:

explain analyze select * from (select  * 
from PUBLIC.BOARD_MESSAGE 
where THREAD_ID = 3354253 
order byMESSAGE_ID asc ) as foo 
limit   20 
offset  0; 
 
QUERY PLAN




-

 Limit  (cost=8083.59..8083.84 rows=20 width=464) (actual
time=1497.455..1498.466 rows=20 loops=1) 
   ->  Subquery Scan foo  (cost=8083.59..8153.67 rows=5606 width=464)
(actual time=1497.447..1498.408 rows=20 loops=1) 
 ->  Sort  (cost=8083.59..8097.61 rows=5606 width=1148) (actual
time=1497.326..1497.353 rows=20 loops=1) 
   Sort Key: message_id 
   ->  Index Scan using nidx_bm_thread_id on board_message
(cost=0.00..7734.54 rows=5606 width=1148) (actual time=0.283..1431.752
rows=4215 loops=1)

 Index Cond: (thread_id = 3354253) 
 Total runtime: 1502.138 ms 

Now this is much more like it. As far as I interpret the explain output,
in the former case the planner decides to just sort the whole table with
it's 1.2m rows by it's primary key on message_id and then filters out
the few thousand rows matching the requested thread_id. In the latter
case, it selects the few thousand rows with the matching thread_id
_first_ and _then_ sorts them according to their message_id. The former
attempt involves sorting of more than a million rows and then filtering
through the result, the latter just uses the index to retrieve a few
thousand rows and sorts those - which is much more efficient.

What's more puzzling is that the results vary somewhat depending on the
overall load situation. When using the first approach without the
subselect, sometimes the planner chooses exactly the same plan as it
does with the second approach - with equally satisfying results in
regard to total execution time; sometimes it does use the first plan and
does complete with a very acceptable execution time, too. But sometimes
(when overall load is sufficiently high, I presume) it just runs and
runs for minutes on end - I've had this thing running for more than one
hour on several occasions until I made some changes to my app which
limits the maximum execution time for a query to no more than 55
seconds.

With this IMHO quite ugly subselect-workaround, performance is
reproducably stable and sufficiently good under either load, so I chose
to stick with it for the time being - but I'd still like to know if I
could have done anything to have the planner choose the evidently better
plan for the first query without such a workaround?

Kind regards

   Markus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Strange planner decision on quite simple select

2005-10-25 Thread Richard Huxton


Markus Wollny wrote:

Hello!

I've got a table BOARD_MESSAGE (message_id int8, thread_id int8, ...)
with pk on message_id and and a non_unique not_null index on thread_id.
A count(*) on BOARD_MESSAGE currently yields a total of 1231171 rows,
the planner estimated a total of 1232530 rows in this table. I've got
pg_autovacuum running on the database and run an additional nightly
VACUUM ANALYZE over it every night.

I've got a few queries of the following type:

select  * 
from PUBLIC.BOARD_MESSAGE 
where THREAD_ID = 3354253 
order byMESSAGE_ID asc 
limit   20 
offset  0; 



There are currently roughly 4500 rows with this thread_id in
BOARD_MESSAGE. Explain-output is like so:

  QUERY PLAN 



-- 
 Limit  (cost=0.00..3927.22 rows=20 width=1148) 
   ->  Index Scan using pk_board_message on board_message
(cost=0.00..1100800.55 rows=5606 width=1148) 
 Filter: (thread_id = 3354253) 
(3 rows) 


I didn't have the patience to actually complete an explain analyze on
that one - I cancelled the query on several attempts after more than 40
minutes runtime. Now I fiddled a little with this statement and tried
nudging the planner in the right direction like so:


Hmm - it shouldn't take that long. If I'm reading this right, it's 
expecting to have to fetch 5606 rows to match thread_id=3354253 the 20 
times you've asked for. Now, what it probably doesn't know is that 
thread_id is correlated with message_id quite highly (actually, I don't 
know that, I'm guessing). So - it starts at message_id=1 and works 
along, but I'm figuring that it needs to reach message_id's in the 3-4 
million range to see any of the required thread.


Suggestions:
1. Try "ORDER BY thread_id,message_id" and see if that nudges things 
your way.
2. Keep #1 and try replacing the index on (thread_id) with 
(thread_id,message_id)


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[PERFORM] Outer join query plans and performance

2005-10-25 Thread Rich Doughty


I tried on pgsql-general but got no reply. re-posting here as it's
probably the best place to ask

I'm having some significant performance problems with left join. Can
anyone give me any pointers as to why the following 2 query plans are so
different?


EXPLAIN SELECT *
FROM
tokens.ta_tokens  t  LEFT JOIN
tokens.ta_tokenhist   h1 ON t.token_id = h1.token_id LEFT JOIN
tokens.ta_tokenhist   h2 ON t.token_id = h2.token_id
WHERE
h1.histdate = 'now';


 Nested Loop Left Join  (cost=0.00..68778.43 rows=2215 width=1402)
   ->  Nested Loop  (cost=0.00..55505.62 rows=2215 width=714)
 ->  Index Scan using idx_tokenhist__histdate on ta_tokenhist h1  
(cost=0.00..22970.70 rows=5752 width=688)
   Index Cond: (histdate = '2005-10-24 13:28:38.411844'::timestamp 
without time zone)
 ->  Index Scan using ta_tokens_pkey on ta_tokens t  (cost=0.00..5.64 
rows=1 width=26)
   Index Cond: ((t.token_id)::integer = ("outer".token_id)::integer)
   ->  Index Scan using fkx_tokenhist__tokens on ta_tokenhist h2  
(cost=0.00..5.98 rows=1 width=688)
 Index Cond: (("outer".token_id)::integer = (h2.token_id)::integer)


Performance is fine for this one and the plan is pretty much as i'd
expect.

This is where i hit a problem.


EXPLAIN SELECT *
FROM
tokens.ta_tokens  t  LEFT JOIN
tokens.ta_tokenhist   h1 ON t.token_id = h1.token_id LEFT JOIN
tokens.ta_tokenhist   h2 ON t.token_id = h2.token_id
WHERE
h2.histdate = 'now';


 Hash Join  (cost=1249148.59..9000709.22 rows=2215 width=1402)
   Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
   ->  Hash Left Join  (cost=1225660.51..8181263.40 rows=4045106 width=714)
 Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
 ->  Seq Scan on ta_tokens t  (cost=0.00..71828.06 rows=4045106 
width=26)
 ->  Hash  (cost=281243.21..281243.21 rows=10504921 width=688)
   ->  Seq Scan on ta_tokenhist h1  (cost=0.00..281243.21 
rows=10504921 width=688)
   ->  Hash  (cost=22970.70..22970.70 rows=5752 width=688)
 ->  Index Scan using idx_tokenhist__histdate on ta_tokenhist h2  
(cost=0.00..22970.70 rows=5752 width=688)
   Index Cond: (histdate = '2005-10-24 13:34:51.371905'::timestamp 
without time zone)


I would understand if h2 was joined on h1, but it isn't. It only joins
on t. can anyone give any tips on improving the performance of the second
query (aside from changing the join order manually)?


select version();
   version
--
 PostgreSQL 8.0.3 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.2 20050821 
(prerelease) (Debian 4.0.1-6)


Thanks

--

  - Rich Doughty

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Why Index is not working on date columns.

2005-10-25 Thread Tom Lane

Kishore B <[EMAIL PROTECTED]> writes:
>  Can you guys please take a look at the following query and let me know why
> the index is not considered in the plan?

"Considered" and "used" are two different things.

The two examples you give have the same estimated cost (within two
decimal places) so the planner sees no particular reason to choose one
over the other.

I surmise that you are testing on toy tables and extrapolating to what
will happen on larger tables.  This is an unjustified assumption.
Create a realistic test data set, ANALYZE it, and then see if the
planner chooses indexes you like.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Strange planner decision on quite simple select

2005-10-25 Thread Markus Wollny

 
Hi!

> -Ursprüngliche Nachricht-
> Von: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] Im Auftrag 
> von Richard Huxton
> Gesendet: Dienstag, 25. Oktober 2005 12:07
> An: Markus Wollny
> Cc: pgsql-performance@postgresql.org
> Betreff: Re: [PERFORM] Strange planner decision on quite simple select
> 
> Hmm - it shouldn't take that long. If I'm reading this right, 
> it's expecting to have to fetch 5606 rows to match 
> thread_id=3354253 the 20 times you've asked for. Now, what it 
> probably doesn't know is that thread_id is correlated with 
> message_id quite highly (actually, I don't know that, I'm 
> guessing). So - it starts at message_id=1 and works along, 
> but I'm figuring that it needs to reach message_id's in the 
> 3-4 million range to see any of the required thread.

Reading this I tried with adding a "AND MESSAGE_ID >= THREAD_ID" to the 
WHERE-clause, as you've guessed quite correctly, both message_id and thread_id 
are derived from the same sequence and thread_id equals the lowest message_id 
in a thread. This alone did quite a lot to improve things - I got stable 
executing times down from an average 12 seconds to a mere 2 seconds - just 
about the same as with the subselect.

> Suggestions:
> 1. Try "ORDER BY thread_id,message_id" and see if that nudges 
> things your way.
> 2. Keep #1 and try replacing the index on (thread_id) with
> (thread_id,message_id)

Did both (though adding such an index during ordinary workload took some time 
as did the VACUUM ANALYZE afterwards) and that worked like a charm - I've got 
execution times down to as little as a few milliseconds - wow! Thank you very 
much for providing such insightful hints!

Kind regards

   Markus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[PERFORM] insertion of bytea

2005-10-25 Thread Chris Mair

Hi,

I have the following test setup:

* PG 8.0.4 on Linux (Centos 4) compiled from source.

* DB schema: essentially one table with a few int columns and
  one bytea column that stores blobs of 52000 bytes each, a
  primary key on one of the int columns.

* A test client was written in C using libpq to see what rate
  can be reached (inserting records). The client uses a
  prepared tatement and bundles n inserts into a single
  transaction (n is variable for testing).

* Hardware: different setups tested, in particular a
  single-opteron box with a built in SATA disk and also an
  array of SATA disks connected via FC.

>From the test run it appears that the insert rate here is
essentially CPU bound. I'm getting about 11 MB/s net transfer,
regardless if I use the built in disk or the much faster
array and regardless various settings (like n, shared_mem).

vmstat says that disk bo is about 30MB/s (the array can do much
better, I tried with dd and sync!) while the CPU is maxed out
at about 90% us and 10% sy. The client accounts for just 2% CPU,
most goes into the postmaster.

The client inserts random data. I found out that I can improve
things by 35% if I use random sequences of bytes that are
in the printable range vs. full range.


Question 1:
Am I correct in assuming that even though I'm passing my 52000
bytes as a (char *) to PQexecPrepared(), encoding/decoding is
happening (think 0 -> \000) somewhere in the transfer?


Question 2:
Is there a better, faster way to do these inserts?
I'm unsure about large objects. I'm planning to use some
custom server side functions to do computations on the bytes
in these records and the large objects API doesn't appear
to be well suited for this.


Sidequestion:
I've tried to profile the server using CFLAGS="-p -DLINUX_PROFILE".
I'm getting profiling output but when I look at it using
"gprof bin-somewhere/postgres $PGDATA/gmon.out" I'm only seeing
what I think are the calls for the server startup. How can I profile
the (forked) process that actually performs all the work on
my connection?


Sorry for the long post :)
Bye,
Chris.




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[PERFORM] Reindex - Is this necessary after a vacuum?

2005-10-25 Thread Markus Benne

We are reindexing frequently, and I'm wondering if
this is really necessary, given that it appears to
take an exclusive lock on the table.

Our table in question is vacuumed every 4 minutes, and
we are reindexing after each one.

I'm not a fan of locking this table that frequently,
even if it is only for 5 - 10 seconds depending on
load.

The vacuum is a standard vacuum.  Nightly we do a
vacuum analyze.

Thanks for any tips,
...Markus

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] insertion of bytea

2005-10-25 Thread Michael Stone


On Tue, Oct 25, 2005 at 03:44:36PM +0200, Chris Mair wrote:

Is there a better, faster way to do these inserts?


COPY is generally the fastest way to do bulk inserts (see
PQputCopyData). 


Mike Stone

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Outer join query plans and performance

2005-10-25 Thread Tom Lane

Rich Doughty <[EMAIL PROTECTED]> writes:
> EXPLAIN SELECT *
> FROM
>  tokens.ta_tokens  t  LEFT JOIN
>  tokens.ta_tokenhist   h1 ON t.token_id = h1.token_id LEFT JOIN
>  tokens.ta_tokenhist   h2 ON t.token_id = h2.token_id
> WHERE
>  h1.histdate = 'now';

> EXPLAIN SELECT *
> FROM
>  tokens.ta_tokens  t  LEFT JOIN
>  tokens.ta_tokenhist   h1 ON t.token_id = h1.token_id LEFT JOIN
>  tokens.ta_tokenhist   h2 ON t.token_id = h2.token_id
> WHERE
>  h2.histdate = 'now';

The reason these are different is that the second case constrains only
the last-to-be-joined table, so the full cartesian product of t and h1
has to be formed.  If this wasn't what you had in mind, you might be
able to rearrange the order of the LEFT JOINs, but bear in mind that
in general, changing outer-join ordering changes the results.  (This
is why the planner won't fix it for you.)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Reindex - Is this necessary after a vacuum?

2005-10-25 Thread Richard Huxton


Markus Benne wrote:

We are reindexing frequently, and I'm wondering if
this is really necessary, given that it appears to
take an exclusive lock on the table.

Our table in question is vacuumed every 4 minutes, and
we are reindexing after each one.

I'm not a fan of locking this table that frequently,
even if it is only for 5 - 10 seconds depending on
load.

The vacuum is a standard vacuum.  Nightly we do a
vacuum analyze.


At most I'd do a nightly reindex. And in fact, I'd probably drop the 
index, full vacuum, recreate index.


But you only need to reindex at all if you have a specific problem with 
the index bloating. Are you seeing this?


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] insertion of bytea

2005-10-25 Thread Tom Lane

"Chris Mair" <[EMAIL PROTECTED]> writes:
> Am I correct in assuming that even though I'm passing my 52000
> bytes as a (char *) to PQexecPrepared(), encoding/decoding is
> happening (think 0 -> \000) somewhere in the transfer?

Are you specifying it as a text or binary parameter?  Have you looked to
see if the stored data is what you expect?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Reindex - Is this necessary after a vacuum?

2005-10-25 Thread Tom Lane

Markus Benne <[EMAIL PROTECTED]> writes:
> Our table in question is vacuumed every 4 minutes, and
> we are reindexing after each one.

That's pretty silly.  You might need a reindex once in awhile, but
not every time you vacuum.

The draft 8.1 docs contain some discussion of possible reasons for
periodic reindexing:
http://developer.postgresql.org/docs/postgres/routine-reindex.html
but none of these reasons justify once-per-vacuum reindexes.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Why Index is not working on date columns.

2005-10-25 Thread Kishore B

Hi Tom,
 
Thank you for your response.
 
I surmise that you are testing on toy tables and extrapolating to whatwill happen on larger tables. 

These tables participating here contain more than 8 million records as of now, and on every day, 200K records, will add to them.
 
 
Thank you,
Kishore.
 
 
On 10/25/05, Tom Lane <[EMAIL PROTECTED]> wrote:
Kishore B <[EMAIL PROTECTED]> writes:>  Can you guys please take a look at the following query and let me know why
> the index is not considered in the plan?"Considered" and "used" are two different things.The two examples you give have the same estimated cost (within twodecimal places) so the planner sees no particular reason to choose one
over the other.I surmise that you are testing on toy tables and extrapolating to whatwill happen on larger tables.  This is an unjustified assumption.Create a realistic test data set, ANALYZE it, and then see if the
planner chooses indexes you like.   regards, tom lane

Re: [PERFORM] Inefficient escape codes.

2005-10-25 Thread Rodrigo Madera

Ok, thanks for the limits info, but I have that in the manual. Thanks.

But what I really want to know is this:

1) All large objects of all tables inside one DATABASE is kept on only one table. True or false?

Thanks =o)
RodrigoOn 10/25/05, Nörder-Tuitje, Marcus <[EMAIL PROTECTED]> wrote:







oh, 
btw, no harm, but : 
 
having 
5000 tables only to gain access via city name is a major design 
flaw.
 
you 
might consider putting all into one table working with a distributed index 
over yer table (city, loc_texdt, blobfield); creating a partitioned index over 
city.
 
best 
regards

  -Ursprüngliche Nachricht-Von: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED]]Im Auftrag von Rodrigo 
  MaderaGesendet: Montag, 24. Oktober 2005 21:12An: 
  pgsql-performance@postgresql.orgBetreff: Re: [PERFORM] Inefficient 
  escape codes.Now this interests me a lot.Please 
  clarify this:I have 5000 tables, one for each 
  city:City1_Photos, City2_Photos, ... City5000_Photos.Each of 
  these tables are: CREATE TABLE CityN_Photos (location text, lo_id 
  largeobectypeiforgot)So, what's the limit for these large objects? I 
  heard I could only have 4 billion records for the whole database (not for each 
  table). Is this true? If this isn't true, then would postgres manage to create 
  all the large objects I ask him to?Also, this would be a performance 
  penalty, wouldn't it?Much thanks for the knowledge shared,Rodrigo

Re: [PERFORM] impact of stats_command_string

2005-10-25 Thread Michael Fuhr

On Thu, Oct 20, 2005 at 01:33:07PM -0700, [EMAIL PROTECTED] wrote:
> If I turn on stats_command_string, how much impact would it have on
> PostgreSQL server's performance during a period of massive data
> INSERTs?

Do you really need to be doing "massive data INSERTs"?  Can you use
COPY, which is much more efficient for bulk loads?

http://www.postgresql.org/docs/8.0/interactive/populate.html

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] impact of stats_command_string

2005-10-25 Thread Merlin Moncure

> If I turn on stats_command_string, how much impact would it have on
> PostgreSQL server's performance during a period of massive data
> INSERTs?  I know that the answer to the question I'm asking will
> largely depend upon different factors so I would like to know in which
> situations it would be negligible or would have a signifcant impact.

First of all, we have to assume your writes are buffered in some way or
you are using transactions, or you will likely be i/o bound (or you have
a super fast disk setup).

Assuming that, I can tell you from experience on win32 that
stats_command_string can be fairly expensive for certain types of access
patterns.  What patterns?

1. If your ratio of queries to records involved is low.
2. If you are accessing data in a very quick way, for example via
prepared statements over a LAN
3. Your volume of queries is very high.

In these cases, the cost is high.  stats_command_string can add a
fractional millisecond ( ~.2  in my setup ) to statement latency and as
much as double cpu time in extreme cases...you are warned.  You may want
to turn it off before doing bulk loads or lengthy record iterations.

Merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Why Index is not working on date columns.

2005-10-25 Thread Tom Lane

Kishore B <[EMAIL PROTECTED]> writes:
>> I surmise that you are testing on toy tables and extrapolating to what
>> will happen on larger tables.
>> 
> These tables participating here contain more than 8 million records as of
> now, and on every day, 200K records, will add to them.

In that case, have you ANALYZEd the tables lately?  The planner's cost
estimates correspond to awfully small tables ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Outer join query plans and performance

2005-10-25 Thread Rich Doughty


Tom Lane wrote:

Rich Doughty <[EMAIL PROTECTED]> writes:


EXPLAIN SELECT *
FROM
tokens.ta_tokens  t  LEFT JOIN
tokens.ta_tokenhist   h1 ON t.token_id = h1.token_id LEFT JOIN
tokens.ta_tokenhist   h2 ON t.token_id = h2.token_id
WHERE
h1.histdate = 'now';




EXPLAIN SELECT *
FROM
tokens.ta_tokens  t  LEFT JOIN
tokens.ta_tokenhist   h1 ON t.token_id = h1.token_id LEFT JOIN
tokens.ta_tokenhist   h2 ON t.token_id = h2.token_id
WHERE
h2.histdate = 'now';



The reason these are different is that the second case constrains only
the last-to-be-joined table, so the full cartesian product of t and h1
has to be formed.  If this wasn't what you had in mind, you might be
able to rearrange the order of the LEFT JOINs, but bear in mind that
in general, changing outer-join ordering changes the results.  (This
is why the planner won't fix it for you.)


FWIW mysql 4.1 (and i'm no fan at all of mysql) completes both these queries
in approximately 3 seconds. postgres does the first in 6 seconds and the
second in a lot longer (eventually abandoned).


--

  - Rich Doughty

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[PERFORM] Why different execution times for different instances for the same query?

2005-10-25 Thread Kishore B

Hi All, 
 
We are executing a single query that returned very fast on the first instance. But when I executed the same query for multiple times, it is giving strange results. It is not coming back. 
 
When I checked with the processes running in the system, I observed that multiple instances of postmaster are running and all of them are consuming very high amounts of memory. I could also observe that they are sharing the memory in a uniform distribution across them. 

 
Please let me know if any body has experienced the same and how do they resolved it.
 
Thank you,
Kishore.

Re: [PERFORM] Why different execution times for different

2005-10-25 Thread Scott Marlowe

On Tue, 2005-10-25 at 14:46, Kishore B wrote:
> Hi All, 
>  
> We are executing a single query that returned very fast on the first
> instance. But when I executed the same query for multiple times, it is
> giving strange results. It is not coming back. 
>  
> When I checked with the processes running in the system, I observed
> that multiple instances of postmaster are running and all of them are
> consuming very high amounts of memory. I could also observe that they
> are sharing the memory in a uniform distribution across them. 
>  
> Please let me know if any body has experienced the same and how do
> they resolved it.

You may or may not have an actual problem.

For one, if they're each using 128 megs, but sharing 120 megs of that
then that's not too bad.  If they're each using 512 meg and sharing 100
meg of that, then you've got a problem.

What is your sort mem set to?  Going too high can cause memory
starvation and other problems.

Also, when you run top, how much memory is being used for cache and
buffer.  If you've still got a fair amount used for cache then you're
probably ok there.

What are your settings in postgresql.conf that aren't default?  How's
the behaviour as you run 1, then 2, then 3, then 4 and so on?  Where's
the "knee" with this behaviour and what are you running out of, disk IO
or memory or memory bandwidth.

Are you using iostat/vmstat/top/free/ipcs to check resource usage under
load?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] insertion of bytea

2005-10-25 Thread Chris Mair

>>Is there a better, faster way to do these inserts?
>
> COPY is generally the fastest way to do bulk inserts (see
> PQputCopyData).

Thanks :)
I'll give that I try and report the results here later.

Bye, Chris.




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] insertion of bytea

2005-10-25 Thread Chris Mair

>> Am I correct in assuming that even though I'm passing my 52000
>> bytes as a (char *) to PQexecPrepared(), encoding/decoding is
>> happening (think 0 -> \000) somewhere in the transfer?
>
> Are you specifying it as a text or binary parameter?  Have you looked to
> see if the stored data is what you expect?

I'm specifying it as binary (i.e. one's in PQexecPrepared's
format parameter). The stored data is correct.

I'll try "copy from stdin with binary" tomorrow and see what
I get...

Thanks & Bye, Chris.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Outer join query plans and performance

2005-10-25 Thread Tom Lane

Rich Doughty <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> The reason these are different is that the second case constrains only
>> the last-to-be-joined table, so the full cartesian product of t and h1
>> has to be formed.  If this wasn't what you had in mind, you might be
>> able to rearrange the order of the LEFT JOINs, but bear in mind that
>> in general, changing outer-join ordering changes the results.  (This
>> is why the planner won't fix it for you.)

> FWIW mysql 4.1 (and i'm no fan at all of mysql) completes both these queries
> in approximately 3 seconds.

Does mysql get the correct answer, though?  It's hard to see how they do
this fast unless they (a) are playing fast and loose with the semantics,
or (b) have very substantially more analysis logic for OUTER JOIN semantics
than we do.  Perhaps mysql 5.x is better about this sort of thing, but
for 4.x I'd definitely find theory (a) more plausible than (b).

The cases that would be interesting are those where rearranging the
outer join order actually does change the correct answer --- it may not
in this particular case, I haven't thought hard about it.  It seems
fairly likely to me that they are rearranging the join order here, and
I'm just wondering whether they have the logic needed to verify that
such a transformation is correct.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Outer join query plans and performance

2005-10-25 Thread Kevin Grittner

In this particular case both outer joins are to the same table, and
the where clause is applied to one or the other, so it's pretty easy
to prove that they should generate identical results.  I'll grant that
this is not generally very useful; but then, simple test cases often
don't look very useful.

We've had mixed results with PostgreSQL and queries with
multiple outer joins when the WHERE clause limits the results
based on columns from the optional tables.  In at least one case
which performs very well, we have enough tables to cause the
"genetic" optimizer to kick in.  (So I suppose there is a chance
that sometimes it won't perform well, although we haven't seen
that happen yet.)

I can't speak to MySQL, but both Sybase and MaxDB handled
such cases accurately, and chose a plan with very fast
execution.  Sybase, however, spent 5 to 10 seconds in the
optimizer finding the sub-second plan.

-Kevin

>>> Tom Lane <[EMAIL PROTECTED]>  >>>
Rich Doughty <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> The reason these are different is that the second case constrains
only
>> the last-to-be-joined table, so the full cartesian product of t and
h1
>> has to be formed.  If this wasn't what you had in mind, you might be
>> able to rearrange the order of the LEFT JOINs, but bear in mind that
>> in general, changing outer-join ordering changes the results.  (This
>> is why the planner won't fix it for you.)

> FWIW mysql 4.1 (and i'm no fan at all of mysql) completes both these
queries
> in approximately 3 seconds.

Does mysql get the correct answer, though?  It's hard to see how they do
this fast unless they (a) are playing fast and loose with the semantics,
or (b) have very substantially more analysis logic for OUTER JOIN
semantics
than we do.  Perhaps mysql 5.x is better about this sort of thing, but
for 4.x I'd definitely find theory (a) more plausible than (b).

The cases that would be interesting are those where rearranging the
outer join order actually does change the correct answer --- it may not
in this particular case, I haven't thought hard about it.  It seems
fairly likely to me that they are rearranging the join order here, and
I'm just wondering whether they have the logic needed to verify that
such a transformation is correct.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Temporary Table

2005-10-25 Thread Christian Paul B. Cosinas

I am creating a temporary table in every function that I execute.
Which I think is bout 100,000 temporary tables a day.

What is the command for vacuuming these 3 tables?

Also I read about the auto vacuum of postgresql.
How can I execute this auto vacuum or the settings in the configuration?

-Original Message-
From: Alvaro Nunes Melo [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 25, 2005 10:58 AM
To: Christian Paul B. Cosinas
Subject: Re: [PERFORM] Temporary Table

Hi Christian,

Christian Paul B. Cosinas wrote:

> Does Creating Temporary table in a function and NOT dropping them 
> affects the performance of the database?
>
I believe it will depend on how many temporary tables you will create in a
daily basis. We had a performance problem caused by them, and by not
monitoring properly the database size. The pg_attribite, pg_class and
pg_depend tables grow a lot. When I found out that this was the problem I
saw some messages in the list archieve, and now the overall performance is
great.

What I do is daily run VACUUM FULL and REINDEX in this three tables.

Alvaro

I choose Polesoft Lockspam to fight spam, and you?
http://www.polesoft.com/refer.html

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] blue prints please

2005-10-25 Thread Sidar López Cruz


where can i find bests practices for tunning postgresql?

_
Consigue aquí las mejores y mas recientes ofertas de trabajo en América 
Latina y USA: http://latam.msn.com/empleos/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[PERFORM] zero performance on query

2005-10-25 Thread Sidar López Cruz


what happend with postgresql 8.1b4 performance on query?
please help me !!!

look at this:
select count(*) from fotos where archivo not in (select archivo from 
archivos)

Aggregate  (cost=4899037992.36..4899037992.37 rows=1 width=0)
->  Seq Scan on fotos  (cost=22598.78..4899037338.07 rows=261716 width=0)
  Filter: (NOT (subplan))
  SubPlan
->  Materialize  (cost=22598.78..39304.22 rows=805344 width=58)
  ->  Seq Scan on archivos  (cost=0.00..13141.44 rows=805344 
width=58)


I WILL DIE WAITING FOR QUERY RESPONSE !!!
--
CREATE TABLE archivos ( archivo varchar(20)) WITHOUT OIDS;
CREATE INDEX archivos_archivo_idx  ON archivos  USING btree(archivo);
~80 rows
--
CREATE TABLE fotos
(
cedula varchar(20),
nombre varchar(100),
apellido1 varchar(100),
apellido2 varchar(100),
archivo varchar(20)
) WITHOUT OIDS;
CREATE INDEX fotos_archivo_idx  ON fotos  USING btree (archivo);
CREATE INDEX fotos_cedula_idx   ON fotos   USING btree (cedula);
~50 rows

_
Consigue aquí las mejores y mas recientes ofertas de trabajo en América 
Latina y USA: http://latam.msn.com/empleos/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] zero performance on query

2005-10-25 Thread Dmitri Bichko

That seems like a pretty horrible way to do that query, given the table sizes.

What about something like:

SELECT count(*)
FROM fotos f
LEFT JOIN archivo a USING(archivo)
WHERE a.archivo IS NULL

Incidentally, can someone explain what the "Materialize" subplan does?  Is this 
new in 8.1?

Dmitri



> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Sidar López Cruz
> Sent: Wednesday, October 26, 2005 12:27 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] zero performance on query
> 
> 
> what happend with postgresql 8.1b4 performance on query?
> please help me !!!
> 
> look at this:
> select count(*) from fotos where archivo not in (select archivo from 
> archivos)
> Aggregate  (cost=4899037992.36..4899037992.37 rows=1 width=0)
> ->  Seq Scan on fotos  (cost=22598.78..4899037338.07 rows=261716 
> -> width=0)
>Filter: (NOT (subplan))
>SubPlan
>  ->  Materialize  (cost=22598.78..39304.22 
> rows=805344 width=58)
>->  Seq Scan on archivos  (cost=0.00..13141.44 
> rows=805344 
> width=58)
> 
> I WILL DIE WAITING FOR QUERY RESPONSE !!!
> --
> CREATE TABLE archivos ( archivo varchar(20)) WITHOUT OIDS; 
> CREATE INDEX archivos_archivo_idx  ON archivos  USING 
> btree(archivo); ~80 rows
> --
> CREATE TABLE fotos
> (
> cedula varchar(20),
> nombre varchar(100),
> apellido1 varchar(100),
> apellido2 varchar(100),
> archivo varchar(20)
> ) WITHOUT OIDS;
> CREATE INDEX fotos_archivo_idx  ON fotos  USING btree (archivo);
> CREATE INDEX fotos_cedula_idx   ON fotos   USING btree (cedula);
> ~50 rows
> 
> _
> Consigue aquí las mejores y mas recientes ofertas de trabajo 
> en América 
> Latina y USA: http://latam.msn.com/empleos/
> 
> 
> ---(end of 
> broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
> 
The information transmitted is intended only for the person or entity to which 
it is addressed and may contain confidential and/or privileged material. Any 
review, retransmission, dissemination or other use of, or taking of any action 
in reliance upon, this information by persons or entities other than the 
intended recipient is prohibited. If you received this in error, please contact 
the sender and delete the material from any computer

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[PERFORM] Configuration Suggestion

2005-10-25 Thread Christian Paul B. Cosinas

Hi! Here is the Specifications of My Server.
I would really appreciate the best configuration of postgresql.conf for my
sevrer.

I have tried so many value in the parameters but It seems that I cannot get
the speed I want.

OS: Redhat Linux
CPU: Dual Xeon
Memory: 6 gigabyte
PostgreSQL Version 8.0

Most of my queries are having Order by Clause, and group by clause.
Creation of temporary table.

The biggest rows is about 3-5 million which I query almost every 5 seconds.

I'm just wondering is it normal to have this result in my memory usage:
 total   used   free sharedbuffers cached
Mem:   61924606172488  19972  0  399045890824
-/+ buffers/cache: 2417605950700
Swap:  2096472  02096472

What does this mean?


I choose Polesoft Lockspam to fight spam, and you?
http://www.polesoft.com/refer.html


---(end of broadcast)---
TIP 6: explain analyze is your friend

[PERFORM] Temporary Table

FW: [PERFORM] Used Memory

[PERFORM] Why Index is not working on date columns.

[PERFORM] Strange planner decision on quite simple select

Re: [PERFORM] Strange planner decision on quite simple select

[PERFORM] Outer join query plans and performance

Re: [PERFORM] Why Index is not working on date columns.

Re: [PERFORM] Strange planner decision on quite simple select

[PERFORM] insertion of bytea

[PERFORM] Reindex - Is this necessary after a vacuum?

Re: [PERFORM] insertion of bytea

Re: [PERFORM] Outer join query plans and performance

Re: [PERFORM] Reindex - Is this necessary after a vacuum?

Re: [PERFORM] insertion of bytea

Re: [PERFORM] Reindex - Is this necessary after a vacuum?

Re: [PERFORM] Why Index is not working on date columns.

Re: [PERFORM] Inefficient escape codes.

Re: [PERFORM] impact of stats_command_string

Re: [PERFORM] impact of stats_command_string

Re: [PERFORM] Why Index is not working on date columns.

Re: [PERFORM] Outer join query plans and performance

[PERFORM] Why different execution times for different instances for the same query?

Re: [PERFORM] Why different execution times for different

Re: [PERFORM] insertion of bytea

Re: [PERFORM] insertion of bytea

Re: [PERFORM] Outer join query plans and performance

Re: [PERFORM] Outer join query plans and performance

Re: [PERFORM] Temporary Table

[PERFORM] blue prints please

[PERFORM] zero performance on query

Re: [PERFORM] zero performance on query

[PERFORM] Configuration Suggestion

32 matches

Site Navigation

Mail list logo

Footer information