Re: [PERFORM] Left Outer Join much faster than non-outer Join?
Ron Mayer wrote: Tom Lane wrote: [EMAIL PROTECTED] writes: select * from streetname_lookup as sl join city_lookup as cl on (true) left outer join tlid_smaller as ts on (sl.geo_streetname_id = ts.geo_streetname_id and cl.geo_city_id=ts.geo_city_id) where str_name='alamo' and city='san antonio' and state='TX' ; That's a fairly odd query; I think it's a very common type of query in data warehousing. It's reasonably typical of a traditional star schema where streetname_lookup and city_lookup are dimension tables and tlid_smaller is the central fact table. Although looking again I must admit the query was written unconventionally. Perhaps those queries are remnants dating back to a version when you could force join orders this way? Perhaps a more common way of writing it would have been: select * from tlid_smaller where geo_streetname_id in (select geo_streetname_id from streetname_lookup where str_name='$str_name') and geo_city_id in (select geo_city_id from city_lookup where city='$city' and state='$state'); However this query also fails to use the multi-column index on (geo_streetname_id,geo_city_id). Explain analyze shown below. In cases where I can be sure only one result will come from each of the lookup queries I guess I can do this: select * from tlid_smaller where geo_streetname_id = (select geo_streetname_id from streetname_lookup where str_name='$str_name') and geo_city_id = (select geo_city_id from city_lookup where city='$city' and state='$state'); which has the nicest plan of them all (explain analyze also shown below). With the tables I have (shown below), how else could one efficiently fetch the data for Main St San Francisco? I guess I just answered that question myself. Where possible, I'll write my queries this way. Thanks, Ron fli=# fli=# explain analyze select * from tlid_smaller where geo_streetname_id in (select geo_streetname_id from streetname_lookup where str_name='alamo') and geo_city_id in (select geo_city_id from city_lookup where city='san antonio' and state='TX'); fli-# fli-# QUERY PLAN Hash IN Join (cost=9.03..29209.16 rows=1 width=32) (actual time=76.576..96.605 rows=78 loops=1) Hash Cond: (outer.geo_city_id = inner.geo_city_id) - Nested Loop (cost=3.02..29202.88 rows=52 width=32) (actual time=65.877..91.789 rows=4151 loops=1) - HashAggregate (cost=3.02..3.02 rows=1 width=4) (actual time=0.039..0.042 rows=1 loops=1) - Index Scan using streetname_lookup__str_name on streetname_lookup (cost=0.00..3.01 rows=1 width=4) (actual time=0.025..0.028 rows=1 loops=1) Index Cond: (str_name = 'alamo'::text) - Index Scan using tlid_smaller__street_zipint on tlid_smaller (cost=0.00..28994.70 rows=16413 width=32) (actual time=65.820..81.309 rows=4151 loops=1) Index Cond: (tlid_smaller.geo_streetname_id = outer.geo_streetname_id) - Hash (cost=6.01..6.01 rows=1 width=4) (actual time=0.054..0.054 rows=0 loops=1) - Index Scan using city_lookup__name on city_lookup (cost=0.00..6.01 rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1) Index Cond: ((city = 'san antonio'::text) AND (state = 'TX'::text)) Total runtime: 97.577 ms (12 rows) fli=# fli=# explain analyze select * from tlid_smaller where geo_streetname_id = (select geo_streetname_id from streetname_lookup where str_name='alamo') and geo_city_id = (select geo_city_id from city_lookup where city='san antonio' and state='TX'); fli-# fli-# QUERY PLAN --- Index Scan using tlid_smaller__street_city on tlid_smaller (cost=9.02..16.88 rows=3 width=32) (actual time=0.115..0.255 rows=78 loops=1) Index Cond: ((geo_streetname_id = $0) AND (geo_city_id = $1)) InitPlan - Index Scan using streetname_lookup__str_name on streetname_lookup (cost=0.00..3.01 rows=1 width=4) (actual time=0.044..0.047 rows=1 loops=1) Index Cond: (str_name = 'alamo'::text) - Index Scan using city_lookup__name on city_lookup (cost=0.00..6.01 rows=1 width=4) (actual time=0.028..0.030 rows=1 loops=1) Index Cond: ((city = 'san antonio'::text) AND (state = 'TX'::text)) Total runtime: 0.474 ms (8 rows) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Left Outer Join much faster than non-outer Join?
[EMAIL PROTECTED] writes: streetname_lookup (for every street name used in the country) streetid | name | type --++-- 1 | Main | St 2 | 1st | St Afa I'm concerned, I would add the column city_id since 2 different streets in 2 different cities may have the same name. Amicalement Patrick ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] fine tuning for logging server
The reason: if the power cord is yanked, the OS _must_ boot back up in good condition. If the DB is corrupted, whatever, nuke it then re- initialize it. But the OS must survive act-of-god events. Well, in that case : - Use reiserfs3 for your disks - Use MySQL with MyISAM tables ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Reading recommendations
Steve Wampler [EMAIL PROTECTED] wrote on 03/30/2005 03:58:12 PM: [EMAIL PROTECTED] wrote: Mohan, Ross wrote: VOIP over BitTorrent? Now *that* I want to see. Aught to be at least as interesting as the TCP/IP over carrier pigeon experiment - and more challenging to boot! It was very challenging. I worked on the credit window sizing and retransmission timer estimation algorithms. We took into account weather patterns, size and age of the bird, feeding times, and the average number of times a bird circles before determining magnetic north. Interestingly, packet size had little effect in the final algorithms. I would love to share them with all of you, but they're classified. Ah, but VOIPOBT requires many people all saying the same thing at the same time. The synchronization alone (since you need to distribute these people adequately to avoid overloading a trunk line...) is probably sufficiently hard to make it interesting. Then there are the problems of different accents, dilects, and languages ;) Interestingly, we had a follow on contract to investigate routing optimization using flooding techniques. Oddly, it was commissioned by a consortium of local car washes. Work stopped when the park service sued us for the cost of cleaning all the statuary, and the company went out of business. We were serving cornish game hens at our frequent dinner parties for months. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Reading recommendations
On 2005-03-31 15:19, [EMAIL PROTECTED] wrote: Now *that* I want to see. Aught to be at least as interesting as the TCP/IP over carrier pigeon experiment - and more challenging to boot! .. Interestingly, we had a follow on contract to investigate routing optimization using flooding techniques. Oddly, it was commissioned by a consortium of local car washes. Work stopped when the park service sued us for the cost of cleaning all the statuary, and the company went out of business. We were serving cornish game hens at our frequent dinner parties for months. This method might have been safer (and it works great with Apaches): http://eagle.auc.ca/~dreid/ cheers stefan ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] fine tuning for logging server
On Wed, Mar 30, 2005 at 08:41:43PM -0600, John Arbash Meinel wrote: If all you are doing is append only logging, the fastest thing is probably just a flat file. You could have something that comes along later to move it into the database. It doesn't really sound like you are using any features a database provides. (normalization, foreign keys, indexes, etc.) Here's two ideas that I don't think have been mentioned yet: Use copy to bulk load the data instead of individual imports. And if you get desperate, you can run pg with fsync=false since you don't seem to care about re-initializing your whole database in the case of unexpected interruption. -Mike ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Reading recommendations
Stefan Weiss wrote: On 2005-03-31 15:19, [EMAIL PROTECTED] wrote: Now *that* I want to see. Aught to be at least as interesting as the TCP/IP over carrier pigeon experiment - and more challenging to boot! .. Interestingly, we had a follow on contract to investigate routing optimization using flooding techniques. Oddly, it was commissioned by a consortium of local car washes. Work stopped when the park service sued us for the cost of cleaning all the statuary, and the company went out of business. We were serving cornish game hens at our frequent dinner parties for months. This method might have been safer (and it works great with Apaches): http://eagle.auc.ca/~dreid/ Aha - VOIPOBD as well as VOIPOBT! What more can one want? VOIPOCP, I suppose... -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Left Outer Join much faster than non-outer Join?
On Thu, 2005-03-31 at 00:15 -0800, Ron Mayer wrote: Ron Mayer wrote: Tom Lane wrote: [EMAIL PROTECTED] writes: select * from streetname_lookup as sl join city_lookup as cl on (true) left outer join tlid_smaller as ts on (sl.geo_streetname_id = ts.geo_streetname_id and cl.geo_city_id=ts.geo_city_id) where str_name='alamo' and city='san antonio' and state='TX' ; That's a fairly odd query; I think it's a very common type of query in data warehousing. It's reasonably typical of a traditional star schema where streetname_lookup and city_lookup are dimension tables and tlid_smaller is the central fact table. Yes, agreed. Although looking again I must admit the query was written unconventionally. Perhaps those queries are remnants dating back to a version when you could force join orders this way? Perhaps a more common way of writing it would have been: select * from tlid_smaller where geo_streetname_id in (select geo_streetname_id from streetname_lookup where str_name='$str_name') and geo_city_id in (select geo_city_id from city_lookup where city='$city' and state='$state'); However this query also fails to use the multi-column index on (geo_streetname_id,geo_city_id). Explain analyze shown below. ...which is my understanding too. In cases where I can be sure only one result will come from each of the lookup queries I guess I can do this: select * from tlid_smaller where geo_streetname_id = (select geo_streetname_id from streetname_lookup where str_name='$str_name') and geo_city_id = (select geo_city_id from city_lookup where city='$city' and state='$state'); which has the nicest plan of them all (explain analyze also shown below). Which is not the case for the generalised star join. The general case query here is: SELECT (whatever) FROM FACT, DIMENSION1 D1, DIMENSION2 D2, DIMENSION3 D3etc.. WHERE FACT.dimension1_pk = D1.dimension1_pk AND FACT.dimension2_pk = D2.dimension2_pk AND FACT.dimension3_pk = D3.dimension3_pk AND D1.dimdescription = 'X' AND D2.dimdescription = 'Y' AND D3.dimdescription = 'Z' ... with FACT PK=(dimension1_pk, dimension2_pk, dimension3_pk) with a more specific example of SELECT sum(item_price) FROM Sales, Store, Item, TTime WHERE Sales.store_pk = Store.store_pk AND Store.region = 'UK' AND Sales.item_pk = Item.item_pk AND Item.category = 'Cameras' AND Sales.time_pk = TTime.time_pk AND TTime.month = 3 AND TTime.year = 2005 A very good plan for solving this, under specific conditions is... CartesianProduct(Store, Item, TTime) - Sales.PK which accesses the largest table only once. As Tom says, the current optimizer won't go near that plan, for good reason, without specifically tweaking collapse limits. I know full well that any changes in that direction will need to be strong because that execution plan is very sensitive to even minor changes in data distribution. The plan requires some fairly extensive checking to be put into place. The selectivity of requests against the smaller tables needs to be very well known, so that the upper bound estimate of cardinality of the cartesian product is feasible AND still low enough to use the index on Sales. This is probably going to need information to be captured on multi- column index selectivity, to ensure that last part. It is likely that the statistics targets on the dimension tables would need to be higher enough to identify MFVs or at least reduce the upper bound of selectivity. It is also requires the table sizes to be examined, to ensure this type of plan is considered pointlessly. Some other systems that support this join type, turn off checking for it by default. We could do the same with enable_starjoin = off. Anyway, seems like a fair amount of work there... yes? Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Reading recommendations
[EMAIL PROTECTED] wrote on 03/31/2005 10:48:09 AM: Stefan Weiss wrote: On 2005-03-31 15:19, [EMAIL PROTECTED] wrote: Now *that* I want to see. Aught to be at least as interesting as the TCP/IP over carrier pigeon experiment - and more challenging to boot! .. Interestingly, we had a follow on contract to investigate routing optimization using flooding techniques. Oddly, it was commissioned by a consortium of local car washes. Work stopped when the park service sued us for the cost of cleaning all the statuary, and the company went out of business. We were serving cornish game hens at our frequent dinner parties for months. This method might have been safer (and it works great with Apaches): http://eagle.auc.ca/~dreid/ Aha - VOIPOBD as well as VOIPOBT! What more can one want? VOIPOCP, I suppose... Start collecting recipes for small game birds now. We ran out pretty quickly. Finally came up with Pigeon Helper and sold it to homeless shelters in New York. Sales were slow until we added a wine sauce. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] How to speed up word count with tsearch2?
I've tested several keyword count from 2 millions record book description table that indexed with tseach2 indexing. The result is always slow for first query attempt. This my sample query: -- test one phrase -- SELECT count(*) from table1 WHEREsearchvector @@ to_tsquery('default' ,'david') limit 100 :: returns 16824 records match. :: take 49618.341 ms (1st attempt) :: take 504.229 ms (2nd attempt) -- test two phrase -- SELECT count(*) from table1 WHERE searchvector @@ to_tsquery('default' ,'marthastewart') limit 100 :: returns 155 records match. :: take 686.669 ms (1st attempt) :: take 40.282 ms (2nd attempt) I use ordinary aggregate function count(*), Is there other way to count faster? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Postgresql.conf setting recommendations for 8.0.1
Hi, We are migrating to a new server with more memory and also from postgres 7.4 to postgres 8.0.1 version. Here are my settings on the current 7.4 version: OS : RedHat 9 CPUS: 2 hyperthreaded Memory: 4gig shared_buffers: 65536 sort_mem: 16384 vacuum_mem: 32768 wal_buffers: 64 effective_cache_size: 393216 checkpoint_segments: 3 checkpoint_timeout: 300 checkpoint_warning: 30 These are settings which I am planning on the new machine with 8.0.1 version: OS: Fedora Core 2 CPUs: 2 hyperthreaded Memory: 8 gig shared_buffers: 131072 work_mem: 32768 maintanence_work_mem: 65536 wal_buffers: 64 effective_cache_size: 786432 checkpoint_segments: 8 checkpoint_timeout: 600 checkpoint_warning: 30 The current settings on my 7.4 version gives me very good performance, so I basically doubled the settings since i will be having the double the memory in the new machine. What my concern is about the effective_cache_settings , according the docs its recommends me to set max to about 2/3 of the total memory and I went little over on top of it, is that ok ? I went little over on my current 7.4 system too, and its giving me very good performance so I used the same calculation for my new system too. Also, can anyone guide me with the ideal settings for vacuum_cost_delay, vacuum_cost_page_hit, vacuum_cost_page_miss, vacuum_cost_page_dirty, vacuum_cost_limit, background_delay, bgwriter_percent, bgwriter_maxpages settings. I am not sure what settings should I make to these parameters , are there any ideal settings for these parameters in a OLTP environment ? Thanks! Pallav ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Very long time to execute and Update, suggestions?
Hi all, I have a table with a little over 200,000 columns in it that I need to update using a regular expression. I admit that though I am not a beginner and postgres, I am also far from an expert. :p I tried to create an Index that would optimize the UPDATE but I may have made an error in how I built it. Here is the table structure, the index I tried to create and an 'EXPLAIN ANALYZE' of the UPDATE (though I am still just now learning how to use 'EXPLAIN'). tle-bu= \d file_info_3 Table public.file_info_3 Column | Type |Modifiers -+--+- file_group_name | text | not null file_group_uid | bigint | not null file_mod_time | bigint | not null file_name | text | not null file_parent_dir | text | not null file_perm | text | not null file_size | bigint | not null file_type | character varying(2) | not null default 'f'::character varying file_user_name | text | not null file_user_uid | bigint | not null file_backup | boolean | not null default true file_display| boolean | not null default false file_restore| boolean | not null default false Indexes: file_info_3_display_idx btree (file_type, file_parent_dir, file_name) Here is the EXPLAIN: tle-bu= EXPLAIN ANALYZE UPDATE file_info_3 SET file_backup='f' WHERE file_parent_dir~'^/home' OR (file_parent_dir='/' AND file_name='home'); QUERY PLAN --- Seq Scan on file_info_3 (cost=0.00..7770.00 rows=1006 width=206) (actual time=1050.813..5648.462 rows=67835 loops=1) Filter: ((file_parent_dir ~ '^/home'::text) OR ((file_parent_dir = '/'::text) AND (file_name = 'home'::text))) Total runtime: 68498.898 ms (3 rows) I thought that it would have used the index because 'file_parent_dir' and 'file_name' are in the index but is I am reading the EXPLAIN output right it isn't but is instead doing a sequencial scan. If that is the case, how would I best built the index? Should I have just used the 'file_parent_dir' and 'file_name'? Thanks all!! Madison -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Madison Kelly (Digimer) TLE-BU, The Linux Experience; Back Up http://tle-bu.thelinuxexperience.com -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Very long time to execute and Update, suggestions?
Philip Hallstrom wrote: I'm not sure about this which is why I'm replying off list, but your index is on file_type, file_parent_dir, and file_name and you're query is on file_parent_dir and file_name. I seem to remember reading that that the index will only get used if the columns in the where clause match up in order. That is um... if you have an index on columns a and b and a where clause of b = 1 it woin't use the index since the index looks like a, b a, b a, b etc... Does that make any sense? Not sure if that's right or not, but easy enough to remove the file_type from your index and try it. post back to the list if that's it. -philip Thanks for the reply! I have played around a little more and have created a few different test Indexes and it looks like it is the regex that is causing it to do the sequential scan. If I remove the regex and create a 'file_parent_dir', 'file_name' index it will use it. If I create an Index just for 'file_parent_dir' and change my UPDATE to just look for the regex '... WHERE file_parent_dir~'^/dir'...' it will still do the sequential scan anyway. So I need to either find an Index that will work with regexes or re-write my code to update each subdirectory separately and use simpler UPDATE statement for each. Thanks again! Madison PS - I cc'ed the list to follow up on what I found out so far. (Hi list!) -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Madison Kelly (Digimer) TLE-BU, The Linux Experience; Back Up http://tle-bu.thelinuxexperience.com -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] How to speed up word count in tsearch2?
(It is the 2nd posting, maybe the 1st one didn't goes thru) I've tested several keyword count from 2 millions record book description table that indexed with tseach2 indexing. The result is always slow for first query attempt. This my sample query: -- test one phrase -- SELECT count(*) from table1 WHEREsearchvector @@ to_tsquery('default' ,'david') limit 100 :: returns 16824 records match. :: take 49618.341 ms (1st attempt) :: take 504.229 ms (2nd attempt) -- test two phrase -- SELECT count(*) from table1 WHERE searchvector @@ to_tsquery('default' ,'marthastewart') limit 100 :: returns 155 records match. :: take 686.669 ms (1st attempt) :: take 40.282 ms (2nd attempt) I use ordinary aggregate function count(*), Is there other way to count faster? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] How to speed up word count in tsearch2?
Yudie, (It is the 2nd posting, maybe the 1st one didn't goes thru) I've tested several keyword count from 2 millions record book description table that indexed with tseach2 indexing. The result is always slow for first query attempt. Yes, this is because your tsearch2 index is getting pushed out of RAM. When the index is cached it's very, very fast but takes a long time to get loaded from disk. You need to look at what else is using RAM on that machine. And maybe buy more. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Very long time to execute and Update, suggestions?
So I need to either find an Index that will work with regexes or re-write my code to update each subdirectory separately and use simpler UPDATE statement for each. Why don't you use a LTREE type to model your directory tree ? It's been designed specifically for this purpose and has indexed regular expression search. http://www.sai.msu.su/~megera/postgres/gist/ltree/ http://www.sai.msu.su/~megera/postgres/gist/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] pg_autovacuum not having enough suction ?
On Fri, Mar 25, 2005 at 06:21:24PM -0500, Bruce Momjian wrote: Can we issue a LOCK TABLE with a statement_timeout, and only do the VACUUM FULL if we can get a lock quickly? That seems like a plan. I think someone else's remark in this thread is important, though: autovacuum shouldn't ever block other transactions, and this approach will definitely run that risk. A -- Andrew Sullivan | [EMAIL PROTECTED] A certain description of men are for getting out of debt, yet are against all taxes for raising money to pay it off. --Alexander Hamilton ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Follow-Up: How to improve db performance with $7K?
Thanks for everyone's feedback on to best improve our Postgresql database for the animal hospital. I re-read the PostgreSQL 8.0 Performance Checklist just to keep focused. We purchased (2) 4 x 146GB 10,000rpm SCSI U320 SCA drive arrays ($2600) and (1) Sun W2100z dual AMD64 workstation with 4GB RAM ($2500). We did not need a rack-mount server, so I though Sun's workstation would do fine. I'll double the RAM. Hopefully, this should out-perform our dual 2.8 Xeon with 4GB of RAM. Now, we need to purchase a good U320 RAID card now. Any suggestions for those which run well under Linux? These two drive arrays main purpose is for our database. For those messed with drive arrays before, how would you slice-up the drive array? Will database performance be effected how our RAID10 is configured? Any suggestions? Thanks. Steve Poe ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] How to speed up word count in tsearch2?
You need to look at what else is using RAM on that machine. And maybe buy more. Ouch.. I had that feeling also. then how can I know how much memory needed for certain amount words? and why counting uncommon words are faster than common one? ---(end of broadcast)--- TIP 8: explain analyze is your friend