Re: [PERFORM] cost-based vacuum

2005-07-11 Thread Simon Riggs
On Fri, 2005-07-08 at 12:25 -0400, Ian Westmacott wrote:
 I am beginning to look at Postgres 8, and am particularly
 interested in cost-based vacuum/analyze.  I'm hoping someone
 can shed some light on the behavior I am seeing.
 
 Suppose there are three threads:
 
 writer_thread
   every 1/15 second do
 BEGIN TRANSACTION
   COPY table1 FROM stdin
   ...
   COPY tableN FROM stdin
   perform several UPDATEs, DELETEs and INSERTs
 COMMIT
 
 reader_thread
   every 1/15 second do
 BEGIN TRANSACTION
   SELECT FROM table1 ...
   ...
   SELECT FROM tableN ...
 COMMIT
 
 analyze_thread
   every 5 minutes do
 ANALYZE table1
 ...
 ANALYZE tableN
 
 
 Now, Postgres 8.0.3 out-of-the-box (all default configs) on a
 particular piece of hardware runs the Postgres connection for
 writer_thread at about 15% CPU (meaningless, I know, but for
 comparison) and runs the Postgres connection for reader_thread
 at about 30% CPU.  Latency for reader_thread seeing updates
 from writer_thread is well under 1/15s.  Impact of
 analyze_thread is negligible.
 
 If I make the single configuration change of setting
 vacuum_cost_delay=1000, each iteration in analyze_thread takes
 much longer, of course.  But what I also see is that the CPU
 usage of the connections for writer_thread and reader_thread
 spike up to well over 80% each (this is a dualie) and latency
 drops to 8-10s, during the ANALYZEs.
 
 I don't understand why this would be.  I don't think there
 are any lock issues, and I don't see any obvious I/O issues.
 Am I missing something?  Is there any way to get some
 insight into what those connections are doing?

The ANALYZE commands hold read locks on the tables you wish to write to.
If you slow them down, you merely slow down your write transactions
also, and then the read transactions that wait behind them. Every time
the ANALYZE sleeps it wakes up the other transactions, which then
realise they can't move because of locks and then wake up the ANALYZEs
for another shot. The end result is that you introduce more context-
switching, without any chance of doing more useful work while the
ANALYZEs sleep.

Don't use the vacuum_cost_delay in this situation. You might try setting
it to 0 for the analyze_thread only.

Sounds like you could speed things up by splitting everything into two
sets of tables, with writer_thread1 and writer_thread2 etc. That way
your 2 CPUs would be able to independently be able to get through more
work without locking each other out.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Question

2005-07-11 Thread Alejandro Lemus
In the past week, one guy of Unix Group in Colombia
say: Postgrest in production is bat, if the power off
in any time the datas is lost why this datas is in
plain files. Postgrest no ssupport data bases with
more 1 millon of records. 
Wath tell me in this respect?, is more best Informix
as say 

Ing. Alejandro Lemus G.
Radio Taxi Aeropuerto S.A.
Avenida de las Américas # 51 - 39 Bogotá - Colombia
Tel: 571-4470694 / 571-4202600 Ext. 260 Fax: 571-2624070
email: [EMAIL PROTECTED]

__
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis! 
Regístrate ya - http://correo.espanol.yahoo.com/ 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Question

2005-07-11 Thread Richard Huxton

Perhaps choose a better subject than question next time?

Alejandro Lemus wrote:

In the past week, one guy of Unix Group in Colombia
say: Postgrest in production is bat, if the power off
in any time the datas is lost


Wrong. And it's called PostgreSQL.

 why this datas is in

plain files. Postgrest no ssupport data bases with
more 1 millon of records. 


Wrong.


Wath tell me in this respect?, is more best Informix
as say 


Your contact in the Unix Group in Columbia obviously talks on subjects 
where he knows little. Perhaps re-evaluate anything else you've heard 
from him.


You can find details on PostgreSQL at http://www.postgresql.org/, 
including the manuals:

 http://www.postgresql.org/docs/8.0/static/index.html
The FAQ:
 http://www.postgresql.org/docs/faq/
Spanish/Brazilian communities, which might prove useful
 http://www.postgresql.org/community/international

PostgreSQL is licensed under the BSD licence, which means you can freely 
download or deploy it in a commercial setting if you desire.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Question

2005-07-11 Thread Magnus Hagander
 In the past week, one guy of Unix Group in Colombia
 say: Postgrest in production is bat, if the power off in any 
 time the datas is lost why this datas is in plain files. 
 Postgrest no ssupport data bases with more 1 millon of records. 
 Wath tell me in this respect?, is more best Informix as say 

Both these statements are completely incorrect. 

Unlike some other database systems, PostgreSQL *does* survive power
loss without any major problems. Assuming you use a metadata journailng
filesystem, and don't run with non-battery-backed write-cache (but no db
can survive that..)

And having a million records is no problem at all. You may run into
considerations when you're talking billions, but you can do that as well
- it just takes a bit more knowledge before you can do it right.

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] cost-based vacuum

2005-07-11 Thread Simon Riggs
On Mon, 2005-07-11 at 09:07 -0400, Ian Westmacott wrote:
 On Mon, 2005-07-11 at 07:31, Simon Riggs wrote:
  The ANALYZE commands hold read locks on the tables you wish to write to.
  If you slow them down, you merely slow down your write transactions
  also, and then the read transactions that wait behind them. Every time
  the ANALYZE sleeps it wakes up the other transactions, which then
  realise they can't move because of locks and then wake up the ANALYZEs
  for another shot. The end result is that you introduce more context-
  switching, without any chance of doing more useful work while the
  ANALYZEs sleep.
 
 Let me make sure I understand.  ANALYZE acquires a read
 lock on the table, that it holds until the operation is
 complete (including any sleeps).  That read lock blocks
 the extension of that table via COPY.  Is that right?
 
 According to the 8.0 docs, ANALYZE acquires an ACCESS SHARE
 lock on the table, and that conflicts only with ACCESS
 EXCLUSIVE.  Thats why I didn't think I had a lock issue,
 since I think COPY only needs ROW EXCLUSIVE.  Or perhaps
 the transaction needs something more?

The docs are correct, but don't show catalog and buffer locks.

...but on further reading of the code there are no catalog locks or
buffer locks held across the sleep points. So, my explanation doesn't
work as an explanation for the sleep/no sleep difference you have
observed.

Best Regards, Simon Riggs





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Question

2005-07-11 Thread Gregory S. Williamson
As a sometimes Informix and PostgreSQL DBA, I disagree with the contentions 
below. We have many tables with 10s of millions of rows in Postgres. We have 
had (alas) power issues with our lab on more than one occasion and the 
afflicted servers have recovered like a champ, every time.

This person may not like postgres (or very much likes Informix), but he 
shouldn't conjure up spurious reasons to support his/her prejudice.

Informix is an excellent product, but it can be costly for web related 
applications. PostgeSQL is also an excellent database. Each has differences 
which may make the decision between the two of them clear. But facts are 
necessary to have a real discussion.

Greg WIlliamson
DBA
GlobeXplorer LLC

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Alejandro
Lemus
Sent: Monday, July 11, 2005 6:00 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Question


In the past week, one guy of Unix Group in Colombia
say: Postgrest in production is bat, if the power off
in any time the datas is lost why this datas is in
plain files. Postgrest no ssupport data bases with
more 1 millon of records. 
Wath tell me in this respect?, is more best Informix
as say 

Ing. Alejandro Lemus G.
Radio Taxi Aeropuerto S.A.
Avenida de las Américas # 51 - 39 Bogotá - Colombia
Tel: 571-4470694 / 571-4202600 Ext. 260 Fax: 571-2624070
email: [EMAIL PROTECTED]

__
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis! 
Regístrate ya - http://correo.espanol.yahoo.com/ 

---(end of broadcast)---
TIP 6: explain analyze is your friend

!DSPAM:42d26e2065882109568359!


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] join and query planner

2005-07-11 Thread Dario Pudlo
(first at all, sorry for my english)
Hi.
   - Does left join restrict the order in which the planner must join
tables? I've read about join, but i'm not sure about left join...
   - If so: Can I avoid this behavior? I mean, make the planner resolve the
query, using statistics (uniqueness, data distribution) rather than join
order.

My query looks like:
SELECT ...
  FROM a, b,
  LEFT JOIN c ON (c.key = a.key)
  LEFT JOIN d on (d.key=a.key)
  WHERE (a.key = b.key)  AND (b.column = 100)

  b.column has a lot better selectivity, but planner insist on resolve
first c.key = a.key.

Of course, I could rewrite something like:
SELECT ...
  FROM
   (SELECT ...
FROM a,b
LEFT JOIN d on (d.key=a.key)
WHERE (b.column = 100)
)
as aa
  LEFT JOIN c ON (c.key = aa.key)

but this is query is constructed by an application with a multicolumn
filter. It's dynamic.
  It means that a user could choose to look for c.column = 1000. And
also, combinations of filters.

So, I need the planner to choose the best plan...

I've already change statistics, I clustered tables with cluster, ran vacuum
analyze, changed work_mem, shared_buffers...

Greetings. TIA.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] join and query planner

2005-07-11 Thread John A Meinel
Dario Pudlo wrote:
 (first at all, sorry for my english)
 Hi.
- Does left join restrict the order in which the planner must join
 tables? I've read about join, but i'm not sure about left join...
- If so: Can I avoid this behavior? I mean, make the planner resolve the
 query, using statistics (uniqueness, data distribution) rather than join
 order.

   My query looks like:
   SELECT ...
   FROM a, b,
   LEFT JOIN c ON (c.key = a.key)
   LEFT JOIN d on (d.key=a.key)
   WHERE (a.key = b.key)  AND (b.column = 100)

   b.column has a lot better selectivity, but planner insist on resolve
 first c.key = a.key.

   Of course, I could rewrite something like:
   SELECT ...
   FROM
(SELECT ...
 FROM a,b
 LEFT JOIN d on (d.key=a.key)
 WHERE (b.column = 100)
 )
 as aa
   LEFT JOIN c ON (c.key = aa.key)

   but this is query is constructed by an application with a multicolumn
 filter. It's dynamic.
   It means that a user could choose to look for c.column = 1000. And
 also, combinations of filters.

   So, I need the planner to choose the best plan...

Probably forcing the other join earlier could help:
SELECT ...
  FROM a JOIN b ON (a.key = b.key)
  LEFT JOIN c ON (c.key = a.key)
...

I think the problem is that postgresql can't break JOIN syntax very
easily. But you can make the JOIN earlier.

John
=:-

 I've already change statistics, I clustered tables with cluster, ran vacuum
 analyze, changed work_mem, shared_buffers...

 Greetings. TIA.


 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster




signature.asc
Description: OpenPGP digital signature


[PERFORM] Sorting on longer key is faster ?

2005-07-11 Thread jobapply
The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER BY x..

How can that be possible?

Btw: x and x||t are same ordered 

phoeniks= explain analyze SELECT * FROM test WHERE i20 ORDER BY x || t;
QUERY PLAN


--
 Sort  (cost=2282.65..2284.92 rows=907 width=946) (actual
time=74.982..79.114 rows=950 loops=1)
   Sort Key: (x || t)
   -  Index Scan using i_i on test  (cost=0.00..2238.09 rows=907 width=946)
(actual time=0.077..51.015 rows=950 loops=1)
 Index Cond: (i  20)
 Total runtime: 85.944 ms
(5 rows)

phoeniks= explain analyze SELECT * FROM test WHERE i20 ORDER BY x;
   QUERY PLAN

-
 Sort  (cost=2280.38..2282.65 rows=907 width=946) (actual
time=175.431..179.239 rows=950 loops=1)
   Sort Key: x
   -  Index Scan using i_i on test  (cost=0.00..2235.82 rows=907 width=946)
(actual time=0.024..5.378 rows=950 loops=1)
 Index Cond: (i  20)
 Total runtime: 183.317 ms
(5 rows)





phoeniks= \d+ test
Table public.test
 Column |  Type   | Modifiers | Description
+-+---+-
 i  | integer |   |
 t  | text|   |
 x  | text|   |
Indexes:
i_i btree (i)
x_i btree (xpath_string(x, 'data'::text))
x_ii btree (xpath_string(x, 'movie/characters/character'::text))
Has OIDs: no


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Sorting on longer key is faster ?

2005-07-11 Thread John A Meinel
Chris Travers wrote:
 John A Meinel wrote:

 jobapply wrote:


 The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER
 BY x..

 How can that be possible?

 Btw: x and x||t are same ordered

 phoeniks= explain analyze SELECT * FROM test WHERE i20 ORDER BY x
 || t;
QUERY PLAN




 What types are x and t, I have the feeling x || t is actually a
 boolean, so it is only a True/False sort, while ORDER BY x has to do
 some sort of string comparison (which might actually be a locale
 depended comparison, and strcoll can be very slow on some locales)



 Am I reading this that wrong?  I would think that x || t would mean
 concatenate x  and t.

Sorry, I think you are right. I was getting my operators mixed up.

 This is interesting.  I never through of writing a multicolumn sort this
 way

I'm also surprised that the sort is faster with a merge operation. Are
you using UNICODE as the database format? I'm just wondering if it is
doing something funny like casting it to an easier to sort type.


 Best Wishes,
 Chris Travers
 Metatron Technology Consulting

PS Don't forget to Reply All so that your messages go back to the list.


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Sorting on longer key is faster ?

2005-07-11 Thread John A Meinel
jobapply wrote:
 The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER BY x..

 How can that be possible?

 Btw: x and x||t are same ordered

 phoeniks= explain analyze SELECT * FROM test WHERE i20 ORDER BY x || t;
 QUERY PLAN

I also thought of another possibility. Are there a lot of similar
entries in X? Meaning that the same value is repeated over and over? It
is possible that the sort code has a weakness when sorting equal values.

For instance, if it was doing a Hash aggregation, you would have the
same hash repeated. (It isn't I'm just mentioning a case where it might
affect something).

If it is creating a tree representation, it might cause some sort of
pathological worst-case behavior, where all entries keep adding to the
same side of the tree, rather than being more balanced.

I don't know the internals of postgresql sorting, but just some ideas.

John
=:-



signature.asc
Description: OpenPGP digital signature