Re: [PERFORM] Rewriting DISTINCT and losing performance
On Monday 21 May 2007 03:14, Josh Berkus wrote: Chuck, Can we see the plan? --Josh Sorry Josh, I guess I could have just used EXPLAIN instead of EXPLAIN ANALYZE. # explain SELECT country_id, country_name FROM geo.country WHERE country_id IN (select country_id FROM geo.city) ; QUERY PLAN Nested Loop IN Join (cost=0.00..1252.60 rows=155 width=15) Join Filter: (country.country_id = city.country_id) - Seq Scan on country (cost=0.00..6.44 rows=244 width=15) - Seq Scan on city (cost=0.00..43409.12 rows=2122712 width=2) (4 rows) Versus the same query using the older, larger world_city table: # explain SELECT country_id, country_name FROM geo.country WHERE country_id IN (select country_id FROM geo.world_city) ; QUERY PLAN Nested Loop IN Join (cost=0.00..23.16 rows=155 width=15) - Seq Scan on country (cost=0.00..6.44 rows=244 width=15) - Index Scan using idx_world_city_country_id on world_city (cost=0.00..706.24 rows=12602 width=2) Index Cond: (country.country_id = world_city.country_id) (4 rows) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Rewriting DISTINCT and losing performance
On Monday 21 May 2007 11:34, Richard Huxton wrote: Chuck D. wrote: The only thing I can think of is that the CLUSTERing on city.country_id makes the system think it'll be cheaper to seq-scan the whole table. I take it you have got 2 million rows in city? Well here is where it gets strange. The CLUSTER was just one thing I tried to do to enhance the performance. I had the same result prior to cluster. However, after updating that country_id column to NOT NULL and eliminating NULL values it will use the country_id index and perform quickly. Oddly enough, the original table, world_city still has NULL values in the country_id column and it has always used the country_id index. Doesn't that seem a bit strange? Does it have to do with the smaller size of the new table maybe? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Rewriting DISTINCT and losing performance
Hi all, I know we've covered this before but I'm having trouble with it today. I have some geographic data in tables that I'm working with. I have a country, state and city table. I was selecting the country_name out of the country table but discovered that some countries (like Antarctica) didn't have cities in the city table. I resolved to query the country table for only country_name's which had country_id's in the city table - meaning the country had cities listed. The problem was I had a couple different sources (in separate tables) with some extraneous column data so I chose to consolidate the city tables from the different sources and column data that I don't need because I don't have the hardware to support it. That was the end of my query time. Here's the original table and query: # \d geo.world_city Table geo.world_city Column | Type | Modifiers ++--- city_id| integer| not null state_id | smallint | country_id | smallint | rc | smallint | latitude | numeric(9,7) | longitude | numeric(10,7) | dsg| character(5) | cc1| character(2) | adm1 | character(2) | city_name | character varying(200) | Indexes: world_city_pk PRIMARY KEY, btree (city_id) idx_world_city_cc1 btree (cc1) idx_world_city_cc1_adm1 btree (cc1, adm1) idx_world_city_country_id btree (country_id) idx_world_city_name_first_letter btree (state_id, substring(lower(city_name::text), 1, 1)) idx_world_city_state_id btree (state_id) explain analyze SELECT country_id, country_name FROM geo.country WHERE country_id IN (select country_id FROM geo.world_city) ; QUERY PLAN - -- Nested Loop IN Join (cost=0.00..167.97 rows=155 width=15) (actual time=85.502..3479.449 rows=231 loops=1) - Seq Scan on country (cost=0.00..6.44 rows=244 width=15) (actual time=0.089..0.658 rows=244 loops=1) - Index Scan using idx_world_city_country_id on world_city (cost=0.00..8185.05 rows=12602 width=2) (actual time=14.250..14.250 rows=1 loops=244) Index Cond: (country.country_id = world_city.country_id) Total runtime: 3479.921 ms Odd that it took 3 seconds because every previous run has been much quicker. The next run was: QUERY PLAN - Nested Loop IN Join (cost=0.00..167.97 rows=155 width=15) (actual time=0.087..6.967 rows=231 loops=1) - Seq Scan on country (cost=0.00..6.44 rows=244 width=15) (actual time=0.028..0.158 rows=244 loops=1) - Index Scan using idx_world_city_country_id on world_city (cost=0.00..8185.05 rows=12602 width=2) (actual time=0.026..0.026 rows=1 loops=244) Index Cond: (country.country_id = world_city.country_id) Total runtime: 7.132 ms (5 rows) But that was irrelevant. I created a new table and eliminated the data and it looks like this: # \d geo.city Table geo.city Column | Type | Modifiers ++--- city_id| integer| not null state_id | smallint | country_id | smallint | latitude | numeric(9,7) | longitude | numeric(10,7) | city_name | character varying(100) | Indexes: city_pk PRIMARY KEY, btree (city_id) idx_city_country_id btree (country_id) CLUSTER Foreign-key constraints: city_state_id_fk FOREIGN KEY (state_id) REFERENCES geo.state(state_id) ON UPDATE CASCADE ON DELETE CASCADE explain analyze SELECT country_id, country_name FROM geo.country WHERE country_id IN (select country_id FROM geo.city) ; -- won't complete in a reasonable amount of time. This one won't use the country_id index. The two tables have almost the same number of rows: cmi=# select count(*) from geo.world_city; count - 1953314 (1 row) cmi=# select count(*) from geo.city; count - 2122712 (1 row) I tried to force it and didn't see any improvement. I've vacuummed, analyzed, clustered. Can someone help me to get only the countries who have cities in the city table in a reasonable amount of time? --- ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] JOIN to a VIEW makes a real slow query
On Tuesday 13 February 2007 14:51, Tom Lane wrote: Chuck D. [EMAIL PROTECTED] writes: It is still using that sequence scan on the view after the APPEND for the us_city and world_city table. Any reason why the view won't use the indexes when it is JOINed to another table but it will when the view is queried without a JOIN? I should have mentioned this is v8.1.4. 8.1 isn't bright enough for that. Should work in 8.2 though. regards, tom lane Upgraded to 8.2.3 in my spare time here - went from the packaged binary that came with Ubuntu to compiling from source. Haven't tuned it yet, but what do you think about this join on the view? cmi=# explain analyze cmi-# select user_id, username, city_name cmi-# FROM m_user AS mu cmi-# LEFT JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id) cmi-# ; QUERY PLAN Nested Loop Left Join (cost=0.00..17.76 rows=10614 width=486) (actual time=0.109..0.113 rows=1 loops=1) Join Filter: (mu.city_id = ci.city_id) - Seq Scan on m_user mu (cost=0.00..1.01 rows=1 width=72) (actual time=0.015..0.017 rows=1 loops=1) - Append (cost=0.00..16.72 rows=2 width=422) (actual time=0.073..0.075 rows=1 loops=1) - Index Scan using pk_us_city on us_city (cost=0.00..8.28 rows=1 width=222) (actual time=0.032..0.032 rows=0 loops=1) Index Cond: (mu.city_id = us_city.city_id) - Index Scan using world_city_pk on world_city (cost=0.00..8.44 rows=1 width=422) (actual time=0.040..0.042 rows=1 loops=1) Index Cond: (mu.city_id = world_city.city_id) Total runtime: 0.359 ms (9 rows) From 65 seconds down to less than 1 ms. Pretty good huh? Nice call Tom. Now I'll have to find some time to do the production server before this app goes up. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] JOIN to a VIEW makes a real slow query
Hi folks, I don't know if this is an SQL or PERFORMANCE list problem but I wanted to check here first. I've seen this discussed on the list before but I'm still not sure of the solution. Maybe my query is just structured wrong. I recently visited an old project of mine that has a 'city', 'state,' and 'country' tables. The city data comes from multiple sources and totals about 3 million rows. I decided to split the city table up based on the source (world_city, us_city). This makes easier updating because the assigned feature id's from the two sources overlap in some cases making it impossible to update as a single merged table. However, I decided to create a view to behave like the old 'city' table. The view is just a simple: SELECT [columns] FROM world_city UNION SELECT [columns] FROM us_city ; Selecting from the view is very quick, but JOINing to the view is slow. About 65 seconds to select a city. It doesn't matter wether it is joined to one table or 6 like it is in my user_detail query - it is still slow. It has indexes on the city_id, state_id, country_id of each table in the view too. Everything has been 'VACUUM ANALYZE' ed. When using explain analyze from the view I get this: cmi=# explain analyze cmi-# select user_id, username, city_name cmi-# FROM m_user AS mu cmi-# left JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id) cmi-# ; QUERY PLAN -- Nested Loop Left Join (cost=650146.58..751018.45 rows=10618 width=55) (actual time=53078.261..61269.190 rows=1 loops=1) Join Filter: (outer.city_id = inner.?column1?) - Seq Scan on m_user mu (cost=0.00..1.01 rows=1 width=27) (actual time=0.010..0.022 rows=1 loops=1) - Unique (cost=650146.58..703236.51 rows=2123597 width=62) (actual time=49458.007..59635.140 rows=2122712 loops=1) - Sort (cost=650146.58..655455.58 rows=2123597 width=62) (actual time=49458.003..55405.965 rows=2122712 loops=1) Sort Key: city_id, state_id, country_id, cc1, rc, adm1, lat, lon, city_name - Append (cost=0.00..73741.94 rows=2123597 width=62) (actual time=18.835..13706.395 rows=2122712 loops=1) - Seq Scan on us_city (cost=0.00..4873.09 rows=169409 width=62) (actual time=18.832..620.553 rows=169398 loops=1) - Seq Scan on world_city (cost=0.00..47632.88 rows=1954188 width=61) (actual time=23.513..11193.341 rows=1953314 loops=1) Total runtime: 61455.471 ms (10 rows) Time: 61512.377 ms So, a sequence scan on the tables in the view, won't use the index. Then do the same query by replacing the view with the real table: cmi=# explain analyze cmi-# select user_id, username, city_name cmi-# FROM m_user AS mu cmi-# left JOIN geo.world_city AS ci ON (mu.city_id = ci.city_id) cmi-# ; QUERY PLAN -- Nested Loop Left Join (cost=0.00..4.04 rows=1 width=36) (actual time=53.854..53.871 rows=1 loops=1) - Seq Scan on m_user mu (cost=0.00..1.01 rows=1 width=27) (actual time=0.010..0.016 rows=1 loops=1) - Index Scan using world_city_pk on world_city ci (cost=0.00..3.01 rows=1 width=17) (actual time=53.825..53.833 rows=1 loops=1) Index Cond: (outer.city_id = ci.city_id) Total runtime: 53.989 ms (5 rows) Time: 56.234 ms I'm not sure that a view on a UNION is the best idea but I don't know how to go about keeping the tables from the data sources with the view (other than modifying them with a source_id column). Any ideas on what is causing the performance lag? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] JOIN to a VIEW makes a real slow query
On Tuesday 13 February 2007 13:16, Merlin Moncure wrote: use 'union all' instead of union. union without all has an implied sort and duplicate removal step that has to be resolved, materializing the view, before you can join to it. Thanks for that Merlin, I forgot about using ALL. That does eliminate the UNIQUE, SORT and SORT lines from the EXPLAIN query. It also brings the query time down from a whopping 65 seconds to 11 seconds. The two tables contain unique rows already so ALL would be required. It is still using that sequence scan on the view after the APPEND for the us_city and world_city table. Any reason why the view won't use the indexes when it is JOINed to another table but it will when the view is queried without a JOIN? I should have mentioned this is v8.1.4. Also, does anyone know why this line: Join Filter: (outer.city_id = inner.?column1?) ... contains ?column1? instead of the actual column name? This is the result after UNION ALL on the view cmi=# explain analyze cmi-# select user_id, username, city_name cmi-# FROM m_user AS mu cmi-# LEFT JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id) cmi-# ; QUERY PLAN - Nested Loop Left Join (cost=0.00..121523.88 rows=10618 width=55) (actual time=2392.376..11061.117 rows=1 loops=1) Join Filter: (outer.city_id = inner.?column1?) - Seq Scan on m_user mu (cost=0.00..1.01 rows=1 width=27) (actual time=0.025..0.028 rows=1 loops=1) - Append (cost=0.00..73741.94 rows=2123597 width=62) (actual time=16.120..9644.315 rows=2122712 loops=1) - Seq Scan on us_city (cost=0.00..4873.09 rows=169409 width=62) (actual time=16.119..899.802 rows=169398 loops=1) - Seq Scan on world_city (cost=0.00..47632.88 rows=1954188 width=61) (actual time=10.585..6949.946 rows=1953314 loops=1) Total runtime: 11061.441 ms (7 rows) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings