[PERFORM] Configuration Suggestion
Hi! Here is the Specifications of My Server. I would really appreciate the best configuration of postgresql.conf for my sevrer. I have tried so many value in the parameters but It seems that I cannot get the speed I want. OS: Redhat Linux CPU: Dual Xeon Memory: 6 gigabyte PostgreSQL Version 8.0 Most of my queries are having Order by Clause, and group by clause. Creation of temporary table. The biggest rows is about 3-5 million which I query almost every 5 seconds. I'm just wondering is it normal to have this result in my memory usage: total used free sharedbuffers cached Mem: 61924606172488 19972 0 399045890824 -/+ buffers/cache: 2417605950700 Swap: 2096472 02096472 What does this mean? I choose Polesoft Lockspam to fight spam, and you? http://www.polesoft.com/refer.html ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] blue prints please
2005/10/26, Sidar López Cruz [EMAIL PROTECTED]: where can i find bests practices for tunning postgresql? http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html -- Jean-Max Reymond CKR Solutions Open Source Nice France http://www.ckr-solutions.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Outer join query plans and performance
Tom Lane wrote: Rich Doughty [EMAIL PROTECTED] writes: Tom Lane wrote: The reason these are different is that the second case constrains only the last-to-be-joined table, so the full cartesian product of t and h1 has to be formed. If this wasn't what you had in mind, you might be able to rearrange the order of the LEFT JOINs, but bear in mind that in general, changing outer-join ordering changes the results. (This is why the planner won't fix it for you.) FWIW mysql 4.1 (and i'm no fan at all of mysql) completes both these queries in approximately 3 seconds. Does mysql get the correct answer, though? It's hard to see how they do this fast unless they (a) are playing fast and loose with the semantics, or (b) have very substantially more analysis logic for OUTER JOIN semantics than we do. Perhaps mysql 5.x is better about this sort of thing, but for 4.x I'd definitely find theory (a) more plausible than (b). i would assume so. i'll re-run my testcase later and verify the results of the two side-by-side. The cases that would be interesting are those where rearranging the outer join order actually does change the correct answer --- it may not in this particular case, I haven't thought hard about it. It seems fairly likely to me that they are rearranging the join order here, and I'm just wondering whether they have the logic needed to verify that such a transformation is correct. regards, tom lane -- - Rich Doughty ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] zero performance on query
On Tue, Oct 25, 2005 at 10:26:43PM -0600, Sidar López Cruz wrote: look at this: select count(*) from fotos where archivo not in (select archivo from archivos) Aggregate (cost=4899037992.36..4899037992.37 rows=1 width=0) - Seq Scan on fotos (cost=22598.78..4899037338.07 rows=261716 width=0) Filter: (NOT (subplan)) SubPlan - Materialize (cost=22598.78..39304.22 rows=805344 width=58) - Seq Scan on archivos (cost=0.00..13141.44 rows=805344 width=58) Now, this is interesting; it seems to trigger exactly the same oddity as my query did (at least one of them; the materialized sequential scan). /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] zero performance on query
look at this: select count(*) from fotos where archivo not in (select archivo from archivos) Aggregate (cost=4899037992.36..4899037992.37 rows=1 width=0) - Seq Scan on fotos (cost=22598.78..4899037338.07 rows=261716 width=0) Filter: (NOT (subplan)) SubPlan - Materialize (cost=22598.78..39304.22 rows=805344 width=58) - Seq Scan on archivos (cost=0.00..13141.44 rows=805344 width=58) I WILL DIE WAITING FOR QUERY RESPONSE !!! Try: select count(*) from fotos f where not exists (select archivo from archivos a where a.archivo = f.archivo) select count(*) from ( select archivo from fotos except select archivo from archivos ); ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] zero performance on query
On Wed, Oct 26, 2005 at 08:05:21AM -0400, Merlin Moncure wrote: select count(*) from fotos f where not exists (select archivo from archivos a where a.archivo = f.archivo) This was an optimization before 7.4, but probably isn't anymore. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Perfomance of views
Hi there. I am currently building a system, where it would be nice to use multiple levels of views upon each other (it is a staticstics system, where traceability is important). Is there any significant performance reduction in say 10 levels of views instead of one giant, nested sql-statement ? I especially think exection planner-wise. The data mainly comes from one small to medium sized tabel ( 5 million rows) and a handfull small ( 5000 rows) support tables. The hardware will be okay for the job, but nothing really fancy (specs are Xeon, 2G of memory, 6 SCSI-disks in a RAID1+0) . The base will be version 8.1 provided that it gets out of beta around end-of-year. Svenne ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] blue prints please
On Tue, Oct 25, 2005 at 22:24:06 -0600, Sidar López Cruz [EMAIL PROTECTED] wrote: where can i find bests practices for tunning postgresql? You should first read the documentation. For 8.1, that would be here: http://developer.postgresql.org/docs/postgres/runtime-config.html There is also good information on techdocs at: http://techdocs.postgresql.org/#techguides (Look under the subcategory optimising.) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Performance issues with custom functions
I currently have an infrastructure that's based around SQL Server 2000. I'm trying to move some of the data over to Postgres, partly to reduce the load on the SQL Server machine and partly because some queries I'd like to run are too slow to be usuable on SQL Server. Mostly likely over time more and more data will move to Postgres. To help with this transition, I created a Postgres plugin which queries the contents of SQL Server tables via ODBC and returns a recordset. I then create views around the function and I can then read from the SQL Server tables as if they were local to Postgres. I have four tables involved in this query. The major one is provider_location, which has about 2 million rows and is stored in Postgres. The other three are stored in SQL Server and accessed via views like I mentioned above. They are network, network_state, and xlat_tbl, and contain about 40, 250, and 500 rows. A simple select * from any of the views takes somewhere around 50ms. This query in question was written for SQL Server. I have no idea why it was written in the form it was, but it ran at a reasonable speed when all the tables were on one machine. Running the original query (after adjusting for syntax differences) on Postgres resulted in a query that would run for hours, continually allocating more RAM. I eventually had to kill the process as it was devouring swap space. My assumption is that Postgres is doing the ODBC query for each row of a join somewhere, even though the function is marked stable (immutable didn't make a difference). Flattening the query made it run in a few minutes. I think the flattened query is easier to read, and it runs faster, so I'm not complaining that I can't use the original query. But I'd like to know exactly what causes the bottleneck in the original query, and if there are other approaches to solving the issue in case I need them in future queries. Below is the original query, the explain output, the modified query, the explain output, and the explain analyze output. Ed select pl.network_id,n.name as network_name,pl.state_cd,count(pl.state_cd) as provider_count from development.provider_location pl,development.network n where pl.network_id in (select ns.network_id from development.network_state ns where ns.from_date current_time and (ns.thru_date current_time or ns.thru_date is null) and (ns.state_cd = pl.state_cd or ns.state_cd='') ) and pl.network_id = n.network_id and pl.state_cd is not null and pl.state_cd in (select field_value from development.xlat_tbl where field_name ='State_CD') group by pl.state_cd,n.name,pl.network_id order by pl.state_cd,network_name; Explain: GroupAggregate (cost=80548547.83..80549256.80 rows=47265 width=52) - Sort (cost=80548547.83..80548665.99 rows=47265 width=52) Sort Key: pl.state_cd, odbc_select.name, pl.network_id - Hash Join (cost=30.01..80543806.14 rows=47265 width=52) Hash Cond: ((outer.network_id)::text = (inner.network_id)::text) - Hash IN Join (cost=15.01..80540931.61 rows=9453 width=20) Hash Cond: ((outer.state_cd)::text = (inner.field_value)::text) - Seq Scan on provider_location pl (cost=0.00..80535150.29 rows=1890593 width=20) Filter: ((state_cd IS NOT NULL) AND (subplan)) SubPlan - Function Scan on odbc_select (cost=0.00..42.50 rows=2 width=32) Filter: (((from_date)::text (('now'::text)::time(6) with time zone)::text) AND (((thru_date)::text (('now'::text)::time(6) with time zone)::text) OR (thru_date IS NULL)) AND (((state_cd)::text = ($0)::text) OR ((state_cd)::text = ''::text))) - Hash (cost=15.00..15.00 rows=5 width=32) - Function Scan on odbc_select (cost=0.00..15.00 rows=5 width=32) Filter: ((field_name)::text = 'State_CD'::text) - Hash (cost=12.50..12.50 rows=1000 width=64) - Function Scan on odbc_select (cost=0.00..12.50 rows=1000 width=64) Flattened query: select pl.network_id, n.name as network_name, pl.state_cd, count(pl.state_cd) as provider_count from development.network n, development.network_state ns, development.xlat_tbl xt, development.provider_location pl where xt.field_name = 'State_CD' and n.network_id = ns.network_id and ns.from_date current_timestamp and (ns.thru_date current_timestamp or ns.thru_date is null) and (ns.state_cd = pl.state_cd or ns.state_cd='') and pl.network_id = n.network_id and pl.state_cd is not null and pl.state_cd = xt.field_value group by pl.state_cd, n.name, pl.network_id order by
[PERFORM] browsing table with 2 million records
I am running Postgre 7.4 on FreeBSD. The main table have 2 million record (we would like to do at least 10 mil or more). It is mainly a FIFO structure with maybe 200,000 new records coming in each day that displace the older records. We have a GUI that let user browser through the record page by page at about 25 records a time. (Don't ask me why but we have to have this GUI). This translates to something like select count(*) from table -- to give feedback about the DB size select * from table order by date limit 25 offset 0 Tables seems properly indexed, with vacuum and analyze ran regularly. Still this very basic SQLs takes up to a minute run. I read some recent messages that select count(*) would need a table scan for Postgre. That's disappointing. But I can accept an approximation if there are some way to do so. But how can I optimize select * from table order by date limit x offset y? One minute response time is not acceptable. Any help would be appriciated. Wy
Re: [PERFORM] browsing table with 2 million records
Do you have an index on the date column? Can you post an EXPLAIN ANALYZE for the slow query? -- Mark Lewis On Wed, 2005-10-26 at 13:41 -0700, aurora wrote: I am running Postgre 7.4 on FreeBSD. The main table have 2 million record (we would like to do at least 10 mil or more). It is mainly a FIFO structure with maybe 200,000 new records coming in each day that displace the older records. We have a GUI that let user browser through the record page by page at about 25 records a time. (Don't ask me why but we have to have this GUI). This translates to something like select count(*) from table -- to give feedback about the DB size select * from table order by date limit 25 offset 0 Tables seems properly indexed, with vacuum and analyze ran regularly. Still this very basic SQLs takes up to a minute run. I read some recent messages that select count(*) would need a table scan for Postgre. That's disappointing. But I can accept an approximation if there are some way to do so. But how can I optimize select * from table order by date limit x offset y? One minute response time is not acceptable. Any help would be appriciated. Wy ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] browsing table with 2 million records
On Wed, 2005-10-26 at 15:41, aurora wrote: I am running Postgre 7.4 on FreeBSD. The main table have 2 million record (we would like to do at least 10 mil or more). It is mainly a FIFO structure with maybe 200,000 new records coming in each day that displace the older records. We have a GUI that let user browser through the record page by page at about 25 records a time. (Don't ask me why but we have to have this GUI). This translates to something like select count(*) from table -- to give feedback about the DB size select * from table order by date limit 25 offset 0 Tables seems properly indexed, with vacuum and analyze ran regularly. Still this very basic SQLs takes up to a minute run. I read some recent messages that select count(*) would need a table scan for Postgre. That's disappointing. But I can accept an approximation if there are some way to do so. But how can I optimize select * from table order by date limit x offset y? One minute response time is not acceptable. Have you run your script without the select count(*) part and timed it? What does explain analyze select * from table order by date limit 25 offset 0 say? Is date indexed? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] browsing table with 2 million records
We have a GUI that let user browser through the record page by page at about 25 records a time. (Don't ask me why but we have to have this GUI). This translates to something like select count(*) from table -- to give feedback about the DB size Do you have a integer field that is an ID that increments? E.g; serial? select * from table order by date limit 25 offset 0 You could use a cursor. Sincerely, Joshua D. Drake Tables seems properly indexed, with vacuum and analyze ran regularly. Still this very basic SQLs takes up to a minute run. I read some recent messages that select count(*) would need a table scan for Postgre. That's disappointing. But I can accept an approximation if there are some way to do so. But how can I optimize select * from table order by date limit x offset y? One minute response time is not acceptable. Any help would be appriciated. Wy -- The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] browsing table with 2 million records
You could also create your own index so to speak as a table that simply contains a list of primary keys and an order value field that you can use as your offset. This can be kept in sync with the master table using triggers pretty easily. 2 million is not very much if you only have a integer pkey, and an integer order value, then you can join it against the main table. create table my_index_table ( primary_key_value int, order_val int, primary key (primary_key_value)); create index my_index_table_order_val_i on index_table (order_val); select * from main_table a, my_index_table b where b.order_val=25 and b.order_val50 and a.primary_key_id=b.primary_key_id If the data updates alot then this won't work as well though as the index table will require frequent updates to potentialy large number of records (although a small number of pages so it still won't be horrible). Alex Turner NetEconomist On 10/26/05, Joshua D. Drake [EMAIL PROTECTED] wrote: We have a GUI that let user browser through the record page by page at about 25 records a time. (Don't ask me why but we have to have this GUI). This translates to something like select count(*) from table -- to give feedback about the DB size Do you have a integer field that is an ID that increments? E.g; serial? select * from table order by date limit 25 offset 0 You could use a cursor. Sincerely, Joshua D. Drake Tables seems properly indexed, with vacuum and analyze ran regularly. Still this very basic SQLs takes up to a minute run. I read some recent messages that select count(*) would need a table scan for Postgre. That's disappointing. But I can accept an approximation if there are some way to do so. But how can I optimize select * from table order by date limit x offset y? One minute response time is not acceptable. Any help would be appriciated. Wy -- The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] browsing table with 2 million records
aurora [EMAIL PROTECTED] writes: It would still be helpful if select count(*) can perform well. If you can settle for an approximate count, pg_class.reltuples might help you. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Materializing a sequential scan
Hi, I finally found what I believe is the root cause for the hopeless performance, after a lot of query rewriting: Subquery Scan mdb_effektiv_tilgang (cost=19821.69..4920621.69 rows=1 width=48) Filter: ((NOT (hashed subplan)) AND (NOT (subplan))) The problem here is simply that 8.1 refuses to hash this part of the plan: - Materialize (cost=546.45..742.37 rows=19592 width=38) - Seq Scan on rita_tilgang (cost=0.00..526.86 rows=19592 width=38) - Seq Scan on personer_nylig_slettet (cost=0.00..31.40 rows=2140 width=4) probably because of the NOT IN with a function inside; I rewrote it to an EXCEPT (which is not equivalent, but good enough for my use), and it instantly hashed the other subplan, and the query went speedily. Well, at least in four seconds and not several hours... Any good ideas why 8.1 would refuse to do this, when 7.4 would do it? It does not matter how high I set my work_mem; even at 2.000.000 it refused to hash the subplan. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] browsing table with 2 million records
I am running Postgre 7.4 on FreeBSD. The main table have 2 million record (we would like to do at least 10 mil or more). It is mainly a FIFO structure with maybe 200,000 new records coming in each day that displace the older records. I'm so sorry, but I have to rant XDDD People who present a list of 100 items, paginated with 10 items per page so that it fits on half a 800x600 screen should be shot. I can scroll with my mousewheel and use text search in my browser... People who present a paginated view with 100.000 pages where you have to apply bisection search by hand to find records starting with F are on page 38651 should be forced to use a keyboard with just 1 key and type in morse code. Problem of pagination is that the page number is meaningless and rather useless to the user. It is also meaningless to the database, which means you have to use slow kludges like count() and limit/offset. And as people insert stuff in the table while you browse, when you hit next page you will see on top, half of what was on the previous page, because it was pushed down by new records. Or you might miss records. So, rather than using a meaningless record offset as a page number, you can use something meaningful, like a date, first letter of a name, region, etc. Of course, MySQL, always eager to encourage sucky-sucky practices, provides a neat CALC_FOUND_ROWS hack, which, while not being super SQL standard compliant, allows you to retrieve the number of rows the query would have returned if you wouldn't have used limit, so you can compute the number of pages and grab one page with only one query. So people use paginators instead of intelligent solutions, like xmlhttp+javascript enabled autocompletion in forms, etc. And you have to scroll to page 38651 to find letter F. So if you need to paginate on your site : CHEAT Who needs a paginated view with 100.000 pages ? - Select min(date) and max(date) from your table - Present a nifty date selector to choose the records from any day, hour, minute, second - show them, with next day and previous day buttons - It's more useful to the user (most likely he wants to know what happened on 01/05/2005 rather than view page 2857) - It's faster (no more limit/offset ! just date BETWEEN a AND b, indexed of course) - no more new items pushing old ones to the next page while you browse - you can pretend to your boss it's just like a paginated list ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] tuning seqscan costs
On Oct 19, 2005, at 9:51 AM, Katherine Stoovs wrote: I want to correlate two index rows of different tables to find an offset so that table1.value = table2.value AND table1.id = table2.id + offset is true for a maximum number of rows. To achieve this, I have the two tables and a table with possible offset values and execute a query: SELECT value,(SELECT COUNT(*) FROM table1,table2 WHERE table1.value = table2.value AND table1.id = table2.id + offset) AS matches FROM offsets ORDER BY matches; The query is very inefficient, however, because the planner doesn't use my indexes and executes seqscans instead. I can get it to execute fast by setting ENABLE_SEQSCAN to OFF, but I have read this will make the performance bad on other query types so I want to know how to tweak the planner costs or possibly other stats so the planner will plan the query correctly and use index scans. There must be something wrong in the planning parameters after all if a plan that is slower by a factor of tens or hundreds becomes estimated better than the fast variant. I have already issued ANALYZE commands on the tables. Thanks for your help, Katherine Stoovs Katherine, If offset is a column in offsets, can you add an index on the expresion table2.id + offset? http://www.postgresql.org/docs/8.0/static/indexes-expressional.html -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Open Source Solutions. Optimized Web Development. http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-469-5150 615-469-5151 (fax) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Materializing a sequential scan
Steinar H. Gunderson [EMAIL PROTECTED] writes: Any good ideas why 8.1 would refuse to do this, when 7.4 would do it? It does not matter how high I set my work_mem; even at 2.000.000 it refused to hash the subplan. AFAICS, subplan_is_hashable() is testing the same conditions in 7.4 and HEAD, so this isn't clear. Want to step through it and see where it's deciding not to hash? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance issues with custom functions
Edward Di Geronimo Jr. [EMAIL PROTECTED] writes: ... I'd like to know exactly what causes the bottleneck in the original query, and if there are other approaches to solving the issue in case I need them in future queries. This is fairly hard to read ... it would help a lot if you had shown the view definitions that the query relies on, so that we could match up the plan elements with the query a bit better. However, I'm thinking the problem is with this IN clause: where pl.network_id in (select ns.network_id from development.network_state ns where ns.from_date current_time and (ns.thru_date current_time or ns.thru_date is null) and (ns.state_cd = pl.state_cd or ns.state_cd='') ) Because the sub-SELECT references pl.state_cd (an outer variable reference), there's no chance of optimizing this into a join-style IN. So the sub-SELECT has to be re-executed for each row of the outer query. BTW, it's not apparent to me that your flattened query gives the same answers as the original. What if a pl row can join to more than one row of the ns output? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Materializing a sequential scan
On Wed, Oct 26, 2005 at 07:06:15PM -0400, Tom Lane wrote: AFAICS, subplan_is_hashable() is testing the same conditions in 7.4 and HEAD, so this isn't clear. Want to step through it and see where it's deciding not to hash? Line 639, ie.: 635 if (!optup-oprcanhash || optup-oprcom != opid || 636 !func_strict(optup-oprcode)) 637 { 638 ReleaseSysCache(tup); 639 return false; 640 } gdb gives (gdb) print *optup $2 = {oprname = { data = \220Ü2\b\000\000\000\000\000\000\000\000\005\230-\b, '\0' repeats 16 times, X\0305\b\020\000\000\000\000\000\000\000ئ\b\020\000\000\000\000\000\000\000ð\213\b\020\000\000, alignmentDummy = 137550992}, oprnamespace = 137542808, oprowner = 64, oprkind = 8 '\b', oprcanhash = -112 '\220', oprleft = 2, oprright = 0, oprresult = 0, oprcom = 0, oprnegate = 0, oprlsortop = 0, oprrsortop = 0, oprltcmpop = 0, oprgtcmpop = 0, oprcode = 0, oprrest = 0, oprjoin = 0} (gdb) print opid $3 = 2373 So it's complaining about the optup-oprcom != opid part. This is of course on the third run through the loop, ie. it's complaining about the argument which is run through the function kortsys2.effektiv_dato(date)... For convenience, I've listed it again here: CREATE FUNCTION kortsys2.effektiv_dato(date) RETURNS date AS 'SELECT CASE WHEN $1 CURRENT_DATE THEN CURRENT_DATE ELSE $1 END' LANGUAGE SQL STABLE; /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] performance on query
I DON'T KNOW WHAT TO DO WITH THIS QUERYS... Comparation with sql server, sql server wins !!! Table sizes: archivos: 40MB fotos: 55MB select count(1) from fotos f where not exists (select a.archivo from archivos a where a.archivo=f.archivo) 173713 ms. 110217 ms. 83122 ms. select count(*) from ( select archivo from fotos except select archivo from archivos ) x; 201479 ms. SELECT count(*) FROM fotos f LEFT JOIN archivos a USING(archivo) WHERE a.archivo IS NULL 199523 ms. _ MSN Amor: busca tu ½ naranja http://latam.msn.com/amor/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Materializing a sequential scan
Steinar H. Gunderson [EMAIL PROTECTED] writes: On Wed, Oct 26, 2005 at 07:06:15PM -0400, Tom Lane wrote: AFAICS, subplan_is_hashable() is testing the same conditions in 7.4 and HEAD, so this isn't clear. Want to step through it and see where it's deciding not to hash? (gdb) print opid $3 = 2373 I don't think you're getting a correct reading for optup, but OID 2373 is timestamp = date: regression=# select * from pg_operator where oid = 2373; oprname | oprnamespace | oprowner | oprkind | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprlsortop | oprrsortop | oprltcmpop | oprgtcmpop | oprcode | oprrest | oprjoin -+--+--+-++-+--+---++---+++++---+-+--- = | 11 | 10 | b | f |1114 | 1082 |16 | 2347 | 2376 | 2062 | 1095 | 2371 | 2375 | timestamp_eq_date | eqsel | eqjoinsel (1 row) which is marked not hashable, quite correctly since the input datatypes aren't even the same. My recollection is that there was no such operator in 7.4; probably in 7.4 the IN ended up using timestamp = timestamp which is hashable. What's not clear though is why you're getting that operator --- aren't both sides of the IN of type date? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Materializing a sequential scan
On Wed, Oct 26, 2005 at 07:53:02PM -0400, Tom Lane wrote: I don't think you're getting a correct reading for optup, but OID 2373 is timestamp = date: [...] My recollection is that there was no such operator in 7.4; probably in 7.4 the IN ended up using timestamp = timestamp which is hashable. You are quite correct, there is no such operator (whether by oid or by description) in my 7.4 installation. What's not clear though is why you're getting that operator --- aren't both sides of the IN of type date? Aha! Figured out the start column wasn't the problem after all. The problem was the stopp column, which was timestamp on one side and date on the other... So, it can be fixed for this instance, but this feels a bit like the pre-8.0 joins on differing data types -- is there any way to fix it? :-) /* QSteinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Materializing a sequential scan
Steinar H. Gunderson [EMAIL PROTECTED] writes: Aha! Figured out the start column wasn't the problem after all. The problem was the stopp column, which was timestamp on one side and date on the other... Ah-hah. So, it can be fixed for this instance, but this feels a bit like the pre-8.0 joins on differing data types -- is there any way to fix it? :-) I have some ideas in the back of my head about supporting cross-data-type hashing. Essentially this would require that the hash functions for two types be compatible in that they generate the same hash value for two values that would be considered equal. (For instance, the integer hash functions already have the property that 42::int2, 42::int4, and 42::int8 will all generate the same hash code. The date and timestamp hash functions don't have such a property ATM, but probably could be made to.) For types that share a hash coding convention, cross-type equality functions could be marked hashable. This is all pretty handwavy at the moment though, and I don't know how soon it will get done. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Materializing a sequential scan
On Wed, Oct 26, 2005 at 08:51:03PM -0400, Tom Lane wrote: I have some ideas in the back of my head about supporting cross-data-type hashing. Essentially this would require that the hash functions for two types be compatible in that they generate the same hash value for two values that would be considered equal. OK, another entry for the TODO then. Anyhow, my query is now on about the same performance level with 8.1 as it was with 7.4 (or rather, a bit faster), so it's no longer a 8.1 blocker for us. Thanks. :-) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] browsing table with 2 million records
We have a GUI that let user browser through the record page by page at about 25 records a time. (Don't ask me why but we have to have this GUI). This translates to something like select count(*) from table -- to give feedback about the DB size select * from table order by date limit 25 offset 0 Heh, sounds like phpPgAdmin...I really should do something about that. Tables seems properly indexed, with vacuum and analyze ran regularly. Still this very basic SQLs takes up to a minute run. Yes, COUNT(*) on a large table is always slow in PostgreSQL. Search the mailing lists for countless discussions about it. Chris ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] browsing table with 2 million records
Who needs a paginated view with 100.000 pages ? - Select min(date) and max(date) from your table - Present a nifty date selector to choose the records from any day, hour, minute, second - show them, with next day and previous day buttons - It's more useful to the user (most likely he wants to know what happened on 01/05/2005 rather than view page 2857) - It's faster (no more limit/offset ! just date BETWEEN a AND b, indexed of course) - no more new items pushing old ones to the next page while you browse - you can pretend to your boss it's just like a paginated list All very well and good, but now do it generically... ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] performance on query
So the issue is that instead of taking 174 seconds the query now takes 201? I'm guessing that SQL server might be using index covering, but that's just a guess. Posting query plans (prefferably with actual timing info; EXPLAIN ANALYZE on PostgreSQL and whatever the equivalent would be for MSSQL) might give us some idea. On Wed, Oct 26, 2005 at 05:47:31PM -0600, Sidar L?pez Cruz wrote: I DON'T KNOW WHAT TO DO WITH THIS QUERYS... Comparation with sql server, sql server wins !!! Table sizes: archivos: 40MB fotos: 55MB select count(1) from fotos f where not exists (select a.archivo from archivos a where a.archivo=f.archivo) 173713 ms. 110217 ms. 83122 ms. select count(*) from ( select archivo from fotos except select archivo from archivos ) x; 201479 ms. SELECT count(*) FROM fotos f LEFT JOIN archivos a USING(archivo) WHERE a.archivo IS NULL 199523 ms. _ MSN Amor: busca tu ? naranja http://latam.msn.com/amor/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings