Re: [PERFORM] Is There Any Way ....

2005-10-05 Thread Dario
I'm sure there will be cases when some human assisted caching algorithm will
perform better than an mathetical statistical based design, but it will also
depend on the human. And it probably will make thing worse when workload
changes and human doesn't realize. It must be considered that, today,
hardware cost is not the %90 of budget that it used to be. Throwing hardware
at the system can be as much expensive as throwing certified it stuff.
(just think in coffee budget! :-) )

If you need to improve user perception, you can do others things. Like
caching a table in your client  (with a trigger for any change on table X
updating a table called timestamp_table_change and a small select to this
table, you can easily know when you must update your client). If it is a
application server, serving http request, then user perception will be
sticked to bandwidth AND application server (some of them have cache for
request).

FYI, I don't recall a mechanism in MSSQL to cache a table in buffers. Oracle
has some structures to allow that. (you know) It uses his own buffer. Since
version 9i, you can set three different data buffers, one (recycled cache)
for low usage tables (I mean tables with blocks which don't have too much
chance to be queried again, like a very large historical table) , one for
high usage tables (keep cache), and the regular one (difference is in
algorithm). And you must also set a buffer cache size for tablespaces with
different block size. But there is no such thing as create table x keep
entirenly in buffer. And above all things, oracle doc always states first,
tune design, then tune queries, then start tunning engine.

greetings.




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Comparative performance

2005-10-04 Thread Dario
Postgresql uses MVCC to ensure data integrity. Server must choose the right
version of tuple, according to transaction ID of statement. Even for a
select (ACID features of postgresql, I think C and I apply here), it must
accomplish some extra work.

-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Joe
Enviado el: martes, 04 de octubre de 2005 18:11
Para: Jim C. Nasby
CC: Andreas Pflug; pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] Comparative performance


Hi Jim,

Jim C. Nasby wrote:
 Also, just because no one else has mentioned it, remember that it's very
 easy to get MySQL into a mode where you have no data integrity. If
 that's the case it's going to be faster than PostgreSQL (though I'm not
 sure how much that affects the performance of SELECTs).

Yes indeed.  When I added the REFERENCES to the schema and reran the
conversion
scripts, aside from having to reorder the table creation and loading (they
used
to be in alphabetical order), I also found a few referential integrity
errors in
the MySQL data.

Joe


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How to Trigger An Automtic Vacuum on Selected Tables

2005-09-29 Thread Dario
Autovacuum does exactly what I understood you want :-)

-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Lane Van
Ingen
Enviado el: jueves, 29 de septiembre de 2005 20:06
Para: pgsql-performance@postgresql.org
Asunto: [PERFORM] How to Trigger An Automtic Vacuum on Selected Tables


I am running version 8.0.1 on Windows 2003. I have an application that
subjects PostgreSQL to sudden bursts of activity at times which cannot be
predicted. The bursts are significant enough to cause performance
degradation, which can be fixed by a 'vacuum analyze'.

I am aware of the existence and contents of tables like pg_class.

QUESTION: I would like to trigger a vacuum analyze process on a table
whenever it gets a large enough burst of activity to warrant it. Using the
data in pg_class (like the number of pages the system found the last time it
was vacuumed / analyzed), I would like to compare those statistics to
current size, and trigger a vacuum/analyze on a table if needed.

Does anyone know of any available tools, or an approach I could use, to
determine what the CURRENT SIZE is ?



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Is There Any Way ....

2005-09-29 Thread Dario
1) AFAIK, no. Just in case you are thinking There should be a way coz I
know it will be used all the time, you must know that postgresql philosophy
is I'm smarter than you. If table is used all the time, it will be in
memory, if not, it won't waste memory.
2) don't know.
3) see number 1) Of course, you could run into a pathological case where
table is queried just before being taken out of memory. But it means, the
table isn't queried all the time...

Greetings...



-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Lane Van
Ingen
Enviado el: jueves, 29 de septiembre de 2005 20:21
Para: pgsql-performance@postgresql.org
Asunto: [PERFORM] Is There Any Way 


... to do the following:
  (1) Make a table memory-resident only ?
  (2) Set up user variables in memory that are persistent across all
sessions, for
  as long as the database is up and running ?
  (3) Assure that a disk-based table is always in memory (outside of keeping
it in
  memory buffers as a result of frequent activity which would prevent
LRU
  operations from taking it out) ?



---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] slow database, queries accumulating

2005-09-27 Thread Dario
I have read that 600 connections are a LOT (somebody correct me please if
I'm wrong), since each connections requires a process and your server must
serve this. Besides the overhead involved, you will end up with 1200
megabytes of sort_mem allocated (probably idle most of time)...

pgpool allows you to reuse process (similar to oracle shared servers). Fact:
I didn't have the need to use it. AFAICS, it's easy to use. (I'll try to
make it work and I'll share tests, but dunno know when)

long life, little spam and prosperity


-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Anjan Dave
Enviado el: viernes, 23 de septiembre de 2005 13:02
Para: pgsql-performance@postgresql.org
Asunto: [PERFORM] slow database, queries accumulating


Hi

We are experiencing consistent slowness on the database for one application.
This is more a reporting type of application, heavy on the bytea data type
usage (gets rendered into PDFs in the app server). A lot of queries, mostly
selects and a few random updates, get accumulated on the server - with
increasing volume of users on the application. Below is a snapshot of top,
with about 80 selects and 3 or 4 updates. Things get better eventually if I
cancel (SIGINT) some of the oldest queries. I also see a few instances of
shared locks not being granted during this time.I don't even see high iowait
or memory starvation during these times, as indicated by top.

-bash-2.05b$ psql -c select * from pg_locks; dbname | grep f
  |  |77922136 | 16761 | ShareLock| f



We (development) are looking into the query optimization (explain analyze,
indexes, etc), and my understanding is that the queries when run for explain
analyze execute fast, but during busy times, they become quite slow, taking
from a few seconds to a few minutes to execute. I do see in the log that
almost all queries do have either ORDER BY, or GROUP BY, or DISTINCT. Does
it hurt to up the sort_mem to 3MB or 4MB? Should I up the
effective_cache_size to 5 or 6GB? The app is does not need a lot of
connections on the database, I can reduce it down from 600.

Based on the description above and the configuration below does any thing
appear bad in config? Is there anything I can try in the configuration to
improve performance?


The database size is about 4GB.
This is PG 7.4.7, RHAS3.0 (u5), Local 4 spindle RAID10 (15KRPM), and logs on
a separate set of drives, RAID10. 6650 server, 4 x XEON, 12GB RAM.
Vacuum is done every night, full vacuum done once a week.
I had increased the shared_buffers and sort_memory recently, which didn't
help.

Thanks,
Anjan




10:44:51  up 14 days, 13:38,  2 users,  load average: 0.98, 1.14, 1.12
264 processes: 257 sleeping, 7 running, 0 zombie, 0 stopped
CPU states:  cpuusernice  systemirq  softirq  iowaitidle
   total   14.4%0.0%7.4%   0.0% 0.0%0.0%   77.9%
   cpu00   15.7%0.0%5.7%   0.0% 0.1%0.0%   78.2%
   cpu01   15.1%0.0%7.5%   0.0% 0.0%0.1%   77.0%
   cpu02   10.5%0.0%5.9%   0.0% 0.0%0.0%   83.4%
   cpu039.9%0.0%5.9%   0.0% 0.0%0.0%   84.0%
   cpu047.9%0.0%3.7%   0.0% 0.0%0.0%   88.2%
   cpu05   19.3%0.0%   12.3%   0.0% 0.0%0.0%   68.3%
   cpu06   20.5%0.0%9.5%   0.0% 0.0%0.1%   69.7%
   cpu07   16.1%0.0%8.5%   0.0% 0.1%0.3%   74.7%
Mem:  12081736k av, 7881972k used, 4199764k free,   0k shrd,   82372k
buff
   4823496k actv, 2066260k in_d,2036k in_c
Swap: 4096532k av,   0k used, 4096532k free 6888900k
cached

  PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
16773 postgres  15   0  245M 245M  240M S 0.0  2.0   1:16   7 postmaster
16880 postgres  15   0  245M 245M  240M S 0.1  2.0   0:49   6 postmaster
16765 postgres  15   0  245M 245M  240M S 0.0  2.0   1:16   0 postmaster
16825 postgres  15   0  245M 245M  240M S 0.0  2.0   1:02   5 postmaster
16774 postgres  15   0  245M 245M  240M S 0.1  2.0   1:16   0 postmaster
16748 postgres  15   0  245M 245M  240M S 0.0  2.0   1:19   5 postmaster
16881 postgres  15   0  245M 245M  240M S 0.1  2.0   0:50   7 postmaster
16762 postgres  15   0  245M 245M  240M S 0.0  2.0   1:14   4 postmaster
.
.


max_connections = 600

shared_buffers = 3  #=234MB, up from 21760=170MB min 16, at least
max_connections*2, 8KB each
sort_mem = 2048 # min 64, size in KB
vacuum_mem = 32768  # up from 16384 min 1024, size in KB

# - Free Space Map -

#max_fsm_pages = 2  # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000   # min 100, ~50 bytes each

#fsync = true   # turns forced synchronization on or off
#wal_sync_method = fsync# the default varies across platforms:
# fsync, 

Re: [PERFORM] VACUUM FULL vs CLUSTER

2005-09-27 Thread Dario
you can see that at the end of vacuum log   (sorry for my english)
...
INFO:  free space map: 930 relations, 48827 pages stored; 60240 total pages
needed  -- NEEDED!
-- I have already configured in postgresql.conf, you can see it below
DETAIL:  Allocated FSM size: 1000 relations + 7 pages = 475 kB shared
memory.   -- ALLOCATED ACCORDING TO max_fsm_pages , etc
VACUUM

You probably must adjust your shared memory, coz the database need it, but
it depends on your database...

(I could be wrong, I'm learning postgresql, please, feel free to correct me)

-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Stef
Enviado el: viernes, 23 de septiembre de 2005 14:18
Para: Bruno Wolff III
CC: Markus Benne; pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] VACUUM FULL vs CLUSTER



Bruno Wolff III mentioned :
=  = If you have a proper FSM setting you shouldn't need to do vacuum
fulls
=  = (unless you have an older version of postgres where index bloat
might
=  = be an issue).

Thanks Alvaro and Bruno

I just want to clarify something that I also couldn't
find a clear cut answer for before.

What is a proper fsm setting?

Someone told me to set max_fsm_relations to the number of
relations in pg_class plus a few more to allow for new relations.
And max_fsm_pages to the number of rows in the biggest table I
want to vacuum, plus a few 1000's for extra room?

Where does this free space map sit? On the disk somewhere,
or in memory, or both.

I once set the max_fsm_pages very high by mistake, and postgres
then started up and used a _lot_ of shared memory, and I had to
increase shmmax. Is there abything to watch out for when bumping this
setting up a lot?

Kind Regards
Stefan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] join and query planner

2005-07-19 Thread Dario
I'll try that.

Let you know as soon as I can take a look.


Thank you-

-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Kevin
Grittner
Enviado el: lunes, 18 de julio de 2005 17:48
Para: pgsql-performance@postgresql.org; [EMAIL PROTECTED]
Asunto: Re: [PERFORM] join and query planner


You might want to set join_collapse_limit high, and use the JOIN
operators rather than the comma-separated lists.  We generate the WHERE
clause on the fly, based on user input, and this has worked well for us.
 
-Kevin
 
 
 Dario [EMAIL PROTECTED] 07/18/05 2:24 PM 
Hi.

 Just out of curiosity, does it do any better with the following?

SELECT ...

Yes, it does.

But my query could also be
SELECT ...
 FROM a
 JOIN b ON (a.key = b.key)
 LEFT JOIN c ON (c.key = a.key)
 LEFT JOIN d ON (d.key=a.key)
/*new*/ , e
 WHERE (b.column = 100)
/*new*/  and (e.key = a.key) and (e.field = 'filter')

because it's constructed by an application. I needed to know if,
somehow,
someway, I can unforce join order.
The only way to solve it so far is changing application. It must build
something like

SELECT ...
 FROM b
 JOIN (a JOIN e ON (e.key = a.key)) ON (a.key = b.key)
 LEFT JOIN c ON (c.key = a.key)
 LEFT JOIN d ON (d.key=a.key)
 WHERE (b.column = 100) and (e.field = 'filter')

Supossed that e.field has (should have) better selectivity. But now this
problem belongs to programmer's group :-)

The query, in fact, has more tables to join. I wonder if lowering geqo
threshold could do the work...

Thank you. Greetings. Long life, little spam and prosperity!


-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Kevin
Grittner
Enviado el: lunes, 18 de julio de 2005 14:58
Para: pgsql-performance@postgresql.org; [EMAIL PROTECTED]
Asunto: Re: [PERFORM] join and query planner


Just out of curiosity, does it do any better with the following?

SELECT ...
 FROM a
 JOIN b ON (a.key = b.key)
 LEFT JOIN c ON (c.key = a.key)
 LEFT JOIN d ON (d.key=a.key)
 WHERE (b.column = 100)


 snipp


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] join and query planner

2005-07-18 Thread Dario
Hi.

 Just out of curiosity, does it do any better with the following?

SELECT ...

Yes, it does.

But my query could also be
SELECT ...
 FROM a
 JOIN b ON (a.key = b.key)
 LEFT JOIN c ON (c.key = a.key)
 LEFT JOIN d ON (d.key=a.key)
/*new*/ , e
 WHERE (b.column = 100)
/*new*/  and (e.key = a.key) and (e.field = 'filter')

because it's constructed by an application. I needed to know if, somehow,
someway, I can unforce join order.
The only way to solve it so far is changing application. It must build
something like

SELECT ...
 FROM b
 JOIN (a JOIN e ON (e.key = a.key)) ON (a.key = b.key)
 LEFT JOIN c ON (c.key = a.key)
 LEFT JOIN d ON (d.key=a.key)
 WHERE (b.column = 100) and (e.field = 'filter')

Supossed that e.field has (should have) better selectivity. But now this
problem belongs to programmer's group :-)

The query, in fact, has more tables to join. I wonder if lowering geqo
threshold could do the work...

Thank you. Greetings. Long life, little spam and prosperity!


-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Kevin
Grittner
Enviado el: lunes, 18 de julio de 2005 14:58
Para: pgsql-performance@postgresql.org; [EMAIL PROTECTED]
Asunto: Re: [PERFORM] join and query planner


Just out of curiosity, does it do any better with the following?

SELECT ...
 FROM a
 JOIN b ON (a.key = b.key)
 LEFT JOIN c ON (c.key = a.key)
 LEFT JOIN d ON (d.key=a.key)
 WHERE (b.column = 100)


 snipp


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] join and query planner

2005-07-11 Thread Dario Pudlo
(first at all, sorry for my english)
Hi.
   - Does left join restrict the order in which the planner must join
tables? I've read about join, but i'm not sure about left join...
   - If so: Can I avoid this behavior? I mean, make the planner resolve the
query, using statistics (uniqueness, data distribution) rather than join
order.

My query looks like:
SELECT ...
  FROM a, b,
  LEFT JOIN c ON (c.key = a.key)
  LEFT JOIN d on (d.key=a.key)
  WHERE (a.key = b.key)  AND (b.column = 100)

  b.column has a lot better selectivity, but planner insist on resolve
first c.key = a.key.

Of course, I could rewrite something like:
SELECT ...
  FROM
   (SELECT ...
FROM a,b
LEFT JOIN d on (d.key=a.key)
WHERE (b.column = 100)
)
as aa
  LEFT JOIN c ON (c.key = aa.key)

but this is query is constructed by an application with a multicolumn
filter. It's dynamic.
  It means that a user could choose to look for c.column = 1000. And
also, combinations of filters.

So, I need the planner to choose the best plan...

I've already change statistics, I clustered tables with cluster, ran vacuum
analyze, changed work_mem, shared_buffers...

Greetings. TIA.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] join and query planner

2005-07-06 Thread Dario

(first at all, sorry for my english)
Hi.
   - Does left join restrict the order in which the planner must join
tables? I've read about join, but i'm not sure about left join...
   - If so: Can I avoid this behavior? I mean, make the planner resolve the
query, using statistics (uniqueness, data distribution) rather than join
order.

My query looks like:
SELECT ...
  FROM a, b,
  LEFT JOIN c ON (c.key = a.key)
  LEFT JOIN d on (d.key=a.key)
  WHERE (a.key = b.key)  AND (b.column = 100)

  b.column has a lot better selectivity, but planner insist on resolve
first c.key = a.key.

Of course, I could rewrite something like:
SELECT ...
  FROM
   (SELECT ...
FROM a,b
LEFT JOIN d on (d.key=a.key)
WHERE (b.column = 100)
)
as aa
  LEFT JOIN c ON (c.key = aa.key)

but this is query is constructed by an application with a multicolumn
filter. It's dynamic.
  It means that a user could choose to look for c.column = 1000. And
also, combinations of filters.

So, I need the planner to choose the best plan...

I've already change statistics, I clustered tables with cluster, ran vacuum
analyze, changed work_mem, shared_buffers...

Greetings. TIA.


---(end of broadcast)---
TIP 8: explain analyze is your friend