Re: [PERFORM] Is There Any Way ....
I'm sure there will be cases when some human assisted caching algorithm will perform better than an mathetical statistical based design, but it will also depend on the "human". And it probably will make thing worse when workload changes and human doesn't realize. It must be considered that, today, hardware cost is not the %90 of budget that it used to be. Throwing hardware at the system can be as much expensive as throwing certified "it stuff". (just think in coffee budget! :-) ) If you need to improve "user perception", you can do others things. Like caching a table in your client (with a trigger for any change on table X updating a table called "timestamp_table_change" and a small select to this table, you can easily know when you must update your client). If it is a application server, serving http request, then "user perception" will be sticked to bandwidth AND application server (some of them have cache for request). FYI, I don't recall a mechanism in MSSQL to cache a table in buffers. Oracle has some structures to allow that. (you know) It uses his own buffer. Since version 9i, you can set three different data buffers, one (recycled cache) for low usage tables (I mean tables with blocks which don't have too much chance to be queried again, like a very large historical table) , one for high usage tables (keep cache), and the regular one (difference is in algorithm). And you must also set a buffer cache size for tablespaces with different block size. But there is no such thing as "create table x keep entirenly in buffer". And above all things, oracle doc always states "first, tune design, then tune queries, then start tunning engine". greetings. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Comparative performance
Postgresql uses MVCC to ensure data integrity. Server must choose the right version of tuple, according to transaction ID of statement. Even for a select (ACID features of postgresql, I think C and I apply here), it must accomplish some extra work. -Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] nombre de Joe Enviado el: martes, 04 de octubre de 2005 18:11 Para: Jim C. Nasby CC: Andreas Pflug; pgsql-performance@postgresql.org Asunto: Re: [PERFORM] Comparative performance Hi Jim, Jim C. Nasby wrote: > Also, just because no one else has mentioned it, remember that it's very > easy to get MySQL into a mode where you have no data integrity. If > that's the case it's going to be faster than PostgreSQL (though I'm not > sure how much that affects the performance of SELECTs). Yes indeed. When I added the REFERENCES to the schema and reran the conversion scripts, aside from having to reorder the table creation and loading (they used to be in alphabetical order), I also found a few referential integrity errors in the MySQL data. Joe ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Is There Any Way ....
1) AFAIK, no. Just in case you are thinking "There should be a way coz I know it will be used all the time", you must know that postgresql philosophy is "I'm smarter than you". If table is used all the time, it will be in memory, if not, it won't waste memory. 2) don't know. 3) see number 1) Of course, you could run into a pathological case where table is queried just before being taken out of memory. But it means, the table isn't queried all the time... Greetings... -Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] nombre de Lane Van Ingen Enviado el: jueves, 29 de septiembre de 2005 20:21 Para: pgsql-performance@postgresql.org Asunto: [PERFORM] Is There Any Way ... to do the following: (1) Make a table memory-resident only ? (2) Set up user variables in memory that are persistent across all sessions, for as long as the database is up and running ? (3) Assure that a disk-based table is always in memory (outside of keeping it in memory buffers as a result of frequent activity which would prevent LRU operations from taking it out) ? ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to Trigger An Automtic Vacuum on Selected Tables
Autovacuum does exactly what I understood you want :-) -Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] nombre de Lane Van Ingen Enviado el: jueves, 29 de septiembre de 2005 20:06 Para: pgsql-performance@postgresql.org Asunto: [PERFORM] How to Trigger An Automtic Vacuum on Selected Tables I am running version 8.0.1 on Windows 2003. I have an application that subjects PostgreSQL to sudden bursts of activity at times which cannot be predicted. The bursts are significant enough to cause performance degradation, which can be fixed by a 'vacuum analyze'. I am aware of the existence and contents of tables like pg_class. QUESTION: I would like to trigger a vacuum analyze process on a table whenever it gets a large enough burst of activity to warrant it. Using the data in pg_class (like the number of pages the system found the last time it was vacuumed / analyzed), I would like to compare those statistics to current size, and trigger a vacuum/analyze on a table if needed. Does anyone know of any available tools, or an approach I could use, to determine what the CURRENT SIZE is ? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] VACUUM FULL vs CLUSTER
you can see that at the end of vacuum log (sorry for my english) ... INFO: free space map: 930 relations, 48827 pages stored; 60240 total pages needed -- NEEDED! -- I have already configured in postgresql.conf, you can see it below DETAIL: Allocated FSM size: 1000 relations + 7 pages = 475 kB shared memory. -- ALLOCATED ACCORDING TO max_fsm_pages , etc VACUUM You probably must adjust your shared memory, coz the database need it, but it depends on your database... (I could be wrong, I'm learning postgresql, please, feel free to correct me) -Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] nombre de Stef Enviado el: viernes, 23 de septiembre de 2005 14:18 Para: Bruno Wolff III CC: Markus Benne; pgsql-performance@postgresql.org Asunto: Re: [PERFORM] VACUUM FULL vs CLUSTER Bruno Wolff III mentioned : => > => If you have a proper FSM setting you shouldn't need to do vacuum fulls => > => (unless you have an older version of postgres where index bloat might => > => be an issue). Thanks Alvaro and Bruno I just want to clarify something that I also couldn't find a clear cut answer for before. What is a proper fsm setting? Someone told me to set max_fsm_relations to the number of relations in pg_class plus a few more to allow for new relations. And max_fsm_pages to the number of rows in the biggest table I want to vacuum, plus a few 1000's for extra room? Where does this free space map sit? On the disk somewhere, or in memory, or both. I once set the max_fsm_pages very high by mistake, and postgres then started up and used a _lot_ of shared memory, and I had to increase shmmax. Is there abything to watch out for when bumping this setting up a lot? Kind Regards Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] slow database, queries accumulating
I have read that 600 connections are a LOT (somebody correct me please if I'm wrong), since each connections requires a process and your server must serve this. Besides the overhead involved, you will end up with 1200 megabytes of sort_mem allocated (probably idle most of time)... pgpool allows you to reuse process (similar to oracle shared servers). Fact: I didn't have the need to use it. AFAICS, it's easy to use. (I'll try to make it work and I'll share tests, but dunno know when) long life, little spam and prosperity -Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] nombre de Anjan Dave Enviado el: viernes, 23 de septiembre de 2005 13:02 Para: pgsql-performance@postgresql.org Asunto: [PERFORM] slow database, queries accumulating Hi We are experiencing consistent slowness on the database for one application. This is more a reporting type of application, heavy on the bytea data type usage (gets rendered into PDFs in the app server). A lot of queries, mostly selects and a few random updates, get accumulated on the server - with increasing volume of users on the application. Below is a snapshot of top, with about 80 selects and 3 or 4 updates. Things get better eventually if I cancel (SIGINT) some of the oldest queries. I also see a few instances of shared locks not being granted during this time.I don't even see high iowait or memory starvation during these times, as indicated by top. -bash-2.05b$ psql -c "select * from pg_locks;" dbname | grep f | |77922136 | 16761 | ShareLock| f We (development) are looking into the query optimization (explain analyze, indexes, etc), and my understanding is that the queries when run for explain analyze execute fast, but during busy times, they become quite slow, taking from a few seconds to a few minutes to execute. I do see in the log that almost all queries do have either ORDER BY, or GROUP BY, or DISTINCT. Does it hurt to up the sort_mem to 3MB or 4MB? Should I up the effective_cache_size to 5 or 6GB? The app is does not need a lot of connections on the database, I can reduce it down from 600. Based on the description above and the configuration below does any thing appear bad in config? Is there anything I can try in the configuration to improve performance? The database size is about 4GB. This is PG 7.4.7, RHAS3.0 (u5), Local 4 spindle RAID10 (15KRPM), and logs on a separate set of drives, RAID10. 6650 server, 4 x XEON, 12GB RAM. Vacuum is done every night, full vacuum done once a week. I had increased the shared_buffers and sort_memory recently, which didn't help. Thanks, Anjan 10:44:51 up 14 days, 13:38, 2 users, load average: 0.98, 1.14, 1.12 264 processes: 257 sleeping, 7 running, 0 zombie, 0 stopped CPU states: cpuusernice systemirq softirq iowaitidle total 14.4%0.0%7.4% 0.0% 0.0%0.0% 77.9% cpu00 15.7%0.0%5.7% 0.0% 0.1%0.0% 78.2% cpu01 15.1%0.0%7.5% 0.0% 0.0%0.1% 77.0% cpu02 10.5%0.0%5.9% 0.0% 0.0%0.0% 83.4% cpu039.9%0.0%5.9% 0.0% 0.0%0.0% 84.0% cpu047.9%0.0%3.7% 0.0% 0.0%0.0% 88.2% cpu05 19.3%0.0% 12.3% 0.0% 0.0%0.0% 68.3% cpu06 20.5%0.0%9.5% 0.0% 0.0%0.1% 69.7% cpu07 16.1%0.0%8.5% 0.0% 0.1%0.3% 74.7% Mem: 12081736k av, 7881972k used, 4199764k free, 0k shrd, 82372k buff 4823496k actv, 2066260k in_d,2036k in_c Swap: 4096532k av, 0k used, 4096532k free 6888900k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 16773 postgres 15 0 245M 245M 240M S 0.0 2.0 1:16 7 postmaster 16880 postgres 15 0 245M 245M 240M S 0.1 2.0 0:49 6 postmaster 16765 postgres 15 0 245M 245M 240M S 0.0 2.0 1:16 0 postmaster 16825 postgres 15 0 245M 245M 240M S 0.0 2.0 1:02 5 postmaster 16774 postgres 15 0 245M 245M 240M S 0.1 2.0 1:16 0 postmaster 16748 postgres 15 0 245M 245M 240M S 0.0 2.0 1:19 5 postmaster 16881 postgres 15 0 245M 245M 240M S 0.1 2.0 0:50 7 postmaster 16762 postgres 15 0 245M 245M 240M S 0.0 2.0 1:14 4 postmaster . . max_connections = 600 shared_buffers = 3 #=234MB, up from 21760=170MB min 16, at least max_connections*2, 8KB each sort_mem = 2048 # min 64, size in KB vacuum_mem = 32768 # up from 16384 min 1024, size in KB # - Free Space Map - #max_fsm_pages = 2 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~50 bytes each #fsync = true # turns forced synchronization on or off #wal_sync_method = fsync# the default varies across platforms: # fsync, fdat
Re: [PERFORM] Left joining against two empty tables makes a query
Sorry for my english. May I ask? (i'm still learning postgresql). Isn't outer join forcing "join order"? The planner will resolve a, then ac in order to resolve left join previously and will not be able to choose the customer_id filter (more selective)... AFAIK (not too far :-)) this will be the join order, even if projects and deparments are not empty, no matter how much statistical info you (the engine) have (has). Workaround: You should probably try to use a subquery to allow planner to choose join order (as long as you can modify source code :-O ). You know project and department are empty now so... SELECT aa.accno, aa.description, aa.link, aa.category, aa.project_id, aa.department, p.projectnumber, d.description from ( SELECT c.accno, c.description, c.link, c.category, ac.project_id, a.department_id AS department FROM chart c JOIN acc_trans ac ON (ac.chart_id = c.id) JOIN ar a ON (a.id = ac.trans_id) WHERE a.customer_id = 11373 AND a.id IN ( SELECT max(id) FROM ar WHERE customer_id = 11373) ) aa LEFT JOIN project p ON (aa.project_id = p.id) LEFT JOIN department d ON (d.id = aa.department) Doubt of it. I rewrite it at first sight. Long life, little spam and prosperity. -Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] nombre de Chris Travers Enviado el: viernes, 29 de julio de 2005 2:23 Para: Gnanavel S CC: Chris Travers; pgsql-performance@postgresql.org Asunto: Re: [PERFORM] Left joining against two empty tables makes a query > > Secondly, the project table has *never* had anything in it. So where > are these numbers coming from? > > > pg_statistics I very much doubt that. I was unable to locate any rows in pg_statistic where the pg_class.oid for either table matched any row's starelid. Tom's argument that this is behavior by design makes sense. I assumed that something like that had to be going on, otherwise there would be nowhere for the numbers to come from. I.e. if there never were any rows in the table, then if pg_statistic is showing 1060 rows, we have bigger problems than a bad query plan. I hope however that eventually tables which are truly empty can be treated intelligently sometime in the future in Left Joins. Otherwise this limits the usefulness of out of the box solutions which may have functionality that we don't use. Such solutions can then kill the database performance quite easily. Chris Travers Metatron Technology Consulting ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] join and query planner
I'll try that. Let you know as soon as I can take a look. Thank you- -Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] nombre de Kevin Grittner Enviado el: lunes, 18 de julio de 2005 17:48 Para: pgsql-performance@postgresql.org; [EMAIL PROTECTED] Asunto: Re: [PERFORM] join and query planner You might want to set join_collapse_limit high, and use the JOIN operators rather than the comma-separated lists. We generate the WHERE clause on the fly, based on user input, and this has worked well for us. -Kevin >>> "Dario" <[EMAIL PROTECTED]> 07/18/05 2:24 PM >>> Hi. > Just out of curiosity, does it do any better with the following? > >SELECT ... Yes, it does. But my query could also be SELECT ... FROM a JOIN b ON (a.key = b.key) LEFT JOIN c ON (c.key = a.key) LEFT JOIN d ON (d.key=a.key) /*new*/ , e WHERE (b.column <= 100) /*new*/ and (e.key = a.key) and (e.field = 'filter') because it's constructed by an application. I needed to know if, somehow, someway, I can "unforce" join order. The only way to solve it so far is changing application. It must build something like SELECT ... FROM b JOIN (a JOIN e ON (e.key = a.key)) ON (a.key = b.key) LEFT JOIN c ON (c.key = a.key) LEFT JOIN d ON (d.key=a.key) WHERE (b.column <= 100) and (e.field = 'filter') Supossed that e.field has (should have) better selectivity. But now this problem belongs to programmer's group :-) The query, in fact, has more tables to join. I wonder if lowering geqo threshold could do the work... Thank you. Greetings. Long life, little spam and prosperity! -Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] nombre de Kevin Grittner Enviado el: lunes, 18 de julio de 2005 14:58 Para: pgsql-performance@postgresql.org; [EMAIL PROTECTED] Asunto: Re: [PERFORM] join and query planner Just out of curiosity, does it do any better with the following? SELECT ... FROM a JOIN b ON (a.key = b.key) LEFT JOIN c ON (c.key = a.key) LEFT JOIN d ON (d.key=a.key) WHERE (b.column <= 100) >>> snipp ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] join and query planner
Hi. > Just out of curiosity, does it do any better with the following? > >SELECT ... Yes, it does. But my query could also be SELECT ... FROM a JOIN b ON (a.key = b.key) LEFT JOIN c ON (c.key = a.key) LEFT JOIN d ON (d.key=a.key) /*new*/ , e WHERE (b.column <= 100) /*new*/ and (e.key = a.key) and (e.field = 'filter') because it's constructed by an application. I needed to know if, somehow, someway, I can "unforce" join order. The only way to solve it so far is changing application. It must build something like SELECT ... FROM b JOIN (a JOIN e ON (e.key = a.key)) ON (a.key = b.key) LEFT JOIN c ON (c.key = a.key) LEFT JOIN d ON (d.key=a.key) WHERE (b.column <= 100) and (e.field = 'filter') Supossed that e.field has (should have) better selectivity. But now this problem belongs to programmer's group :-) The query, in fact, has more tables to join. I wonder if lowering geqo threshold could do the work... Thank you. Greetings. Long life, little spam and prosperity! -Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] nombre de Kevin Grittner Enviado el: lunes, 18 de julio de 2005 14:58 Para: pgsql-performance@postgresql.org; [EMAIL PROTECTED] Asunto: Re: [PERFORM] join and query planner Just out of curiosity, does it do any better with the following? SELECT ... FROM a JOIN b ON (a.key = b.key) LEFT JOIN c ON (c.key = a.key) LEFT JOIN d ON (d.key=a.key) WHERE (b.column <= 100) >>> snipp ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] join and query planner
(first at all, sorry for my english) Hi. - Does "left join" restrict the order in which the planner must join tables? I've read about join, but i'm not sure about left join... - If so: Can I avoid this behavior? I mean, make the planner resolve the query, using statistics (uniqueness, data distribution) rather than join order. My query looks like: SELECT ... FROM a, b, LEFT JOIN c ON (c.key = a.key) LEFT JOIN d on (d.key=a.key) WHERE (a.key = b.key) AND (b.column <= 100) b.column has a lot better selectivity, but planner insist on resolve first c.key = a.key. Of course, I could rewrite something like: SELECT ... FROM (SELECT ... FROM a,b LEFT JOIN d on (d.key=a.key) WHERE (b.column <= 100) ) as aa LEFT JOIN c ON (c.key = aa.key) but this is query is constructed by an application with a "multicolumn" filter. It's dynamic. It means that a user could choose to look for "c.column = 1000". And also, combinations of filters. So, I need the planner to choose the best plan... I've already change statistics, I clustered tables with cluster, ran vacuum analyze, changed work_mem, shared_buffers... Greetings. TIA. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] join and query planner
(first at all, sorry for my english) Hi. - Does "left join" restrict the order in which the planner must join tables? I've read about join, but i'm not sure about left join... - If so: Can I avoid this behavior? I mean, make the planner resolve the query, using statistics (uniqueness, data distribution) rather than join order. My query looks like: SELECT ... FROM a, b, LEFT JOIN c ON (c.key = a.key) LEFT JOIN d on (d.key=a.key) WHERE (a.key = b.key) AND (b.column <= 100) b.column has a lot better selectivity, but planner insist on resolve first c.key = a.key. Of course, I could rewrite something like: SELECT ... FROM (SELECT ... FROM a,b LEFT JOIN d on (d.key=a.key) WHERE (b.column <= 100) ) as aa LEFT JOIN c ON (c.key = aa.key) but this is query is constructed by an application with a "multicolumn" filter. It's dynamic. It means that a user could choose to look for "c.column = 1000". And also, combinations of filters. So, I need the planner to choose the best plan... I've already change statistics, I clustered tables with cluster, ran vacuum analyze, changed work_mem, shared_buffers... Greetings. TIA. ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] ALTER TABLE tabla ALTER COLUMN columna SET STATISTICS number
¿where is stored the value set by ALTER TABLE table_name ALTER COLUMN column_name SET STATISTICS = [1-1000]? I've set this to 1000, and I didn't remember in which column (doh!). Is there any table to look? (I did 'grep "set stat" $PGDATA/pg_log/*' and found it, but may be there is a better way) I couldn't find it in the docs neithr "googling" Greetings -- Long life, little spam and prosperity ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings