Re: [PERFORM] Rewriting DISTINCT and losing performance

2007-05-21 Thread Chuck D.
On Monday 21 May 2007 03:14, Josh Berkus wrote:
 Chuck,

 Can we see the plan?

 --Josh


Sorry Josh,  I guess I could have just used EXPLAIN instead of EXPLAIN 
ANALYZE.

# explain
SELECT   country_id, country_name
FROM geo.country
WHERE country_id IN
 (select country_id FROM geo.city)
;
 QUERY PLAN

 Nested Loop IN Join  (cost=0.00..1252.60 rows=155 width=15)
   Join Filter: (country.country_id = city.country_id)
   -  Seq Scan on country  (cost=0.00..6.44 rows=244 width=15)
   -  Seq Scan on city  (cost=0.00..43409.12 rows=2122712 width=2)
(4 rows)


Versus the same query using the older, larger world_city table:

# explain
SELECT   country_id, country_name
FROM geo.country
WHERE country_id IN
 (select country_id FROM geo.world_city)
;
   QUERY PLAN

 Nested Loop IN Join  (cost=0.00..23.16 rows=155 width=15)
   -  Seq Scan on country  (cost=0.00..6.44 rows=244 width=15)
   -  Index Scan using idx_world_city_country_id on world_city  
(cost=0.00..706.24 rows=12602 width=2)
 Index Cond: (country.country_id = world_city.country_id)
(4 rows)



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Rewriting DISTINCT and losing performance

2007-05-21 Thread Chuck D.
On Monday 21 May 2007 11:34, Richard Huxton wrote:
 Chuck D. wrote:

 The only thing I can think of is that the CLUSTERing on city.country_id
 makes the system think it'll be cheaper to seq-scan the whole table.

 I take it you have got 2 million rows in city?

Well here is where it gets strange.  The CLUSTER was just one thing I tried to 
do to enhance the performance.  I had the same result prior to cluster.

However, after updating that country_id column to NOT NULL and eliminating 
NULL values it will use the country_id index and perform quickly.  Oddly 
enough, the original table, world_city still has NULL values in the 
country_id column and it has always used the country_id index.

Doesn't that seem a bit strange?  Does it have to do with the smaller size of 
the new table maybe?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Rewriting DISTINCT and losing performance

2007-05-20 Thread Chuck D.
Hi all,

I know we've covered this before but I'm having trouble with it today.

I have some geographic data in tables that I'm working with.  I have a
country, state and city table.  I was selecting the country_name out of the
country table but discovered that some countries (like Antarctica) didn't
have cities in the city table.

I resolved to query the country table for only country_name's which had
country_id's in the city table - meaning the country had cities listed.

The problem was I had a couple different sources (in separate tables) with
some extraneous column  data so I chose to consolidate the city tables from
the different sources and column data that I don't need because I don't have
the hardware to support it.

That was the end of my query time.

Here's the original table and query:

# \d geo.world_city
 Table geo.world_city
   Column   |  Type  | Modifiers
++---
 city_id| integer| not null
 state_id   | smallint   |
 country_id | smallint   |
 rc | smallint   |
 latitude   | numeric(9,7)   |
 longitude  | numeric(10,7)  |
 dsg| character(5)   |
 cc1| character(2)   |
 adm1   | character(2)   |
 city_name  | character varying(200) |
Indexes:
world_city_pk PRIMARY KEY, btree (city_id)
idx_world_city_cc1 btree (cc1)
idx_world_city_cc1_adm1 btree (cc1, adm1)
idx_world_city_country_id btree (country_id)
idx_world_city_name_first_letter btree
(state_id, substring(lower(city_name::text), 1, 1))
idx_world_city_state_id btree (state_id)

explain analyze
SELECT   country_id, country_name
FROM geo.country
WHERE country_id IN
 (select country_id FROM geo.world_city)
;

  QUERY
PLAN
-
--
 Nested Loop IN Join  (cost=0.00..167.97 rows=155 width=15) (actual
time=85.502..3479.449 rows=231 loops=1)
   -  Seq Scan on country  (cost=0.00..6.44 rows=244 width=15) (actual
time=0.089..0.658 rows=244 loops=1)
   -  Index Scan using idx_world_city_country_id on world_city
(cost=0.00..8185.05 rows=12602 width=2) (actual time=14.250..14.250 rows=1
loops=244)
 Index Cond: (country.country_id = world_city.country_id)
 Total runtime: 3479.921 ms

Odd that it took 3 seconds because every previous run has been much quicker.
The next run was:

 QUERY
PLAN
-

 Nested Loop IN Join  (cost=0.00..167.97 rows=155 width=15) (actual
time=0.087..6.967 rows=231 loops=1)
   -  Seq Scan on country  (cost=0.00..6.44 rows=244 width=15) (actual
time=0.028..0.158 rows=244 loops=1)
   -  Index Scan using idx_world_city_country_id on world_city
(cost=0.00..8185.05 rows=12602 width=2) (actual time=0.026..0.026 rows=1
loops=244)
 Index Cond: (country.country_id = world_city.country_id)
 Total runtime: 7.132 ms
(5 rows)


But that was irrelevant.  I created a new table and eliminated the data and
 it looks like this:

# \d geo.city
Table geo.city
   Column   |  Type  | Modifiers
++---
 city_id| integer| not null
 state_id   | smallint   |
 country_id | smallint   |
 latitude   | numeric(9,7)   |
 longitude  | numeric(10,7)  |
 city_name  | character varying(100) |
Indexes:
city_pk PRIMARY KEY, btree (city_id)
idx_city_country_id btree (country_id) CLUSTER
Foreign-key constraints:
city_state_id_fk FOREIGN KEY (state_id) REFERENCES geo.state(state_id)
ON UPDATE CASCADE ON DELETE CASCADE

explain analyze
SELECT   country_id, country_name
FROM geo.country
WHERE country_id IN
 (select country_id FROM geo.city)
;

-- won't complete in a reasonable amount of time.

This one won't use the country_id index. The two tables have almost the same
number of rows:

cmi=# select count(*) from geo.world_city;
  count
-
 1953314
(1 row)

cmi=# select count(*) from geo.city;
  count
-
 2122712
(1 row)


 I tried to force it and didn't see any improvement.  I've vacuummed,
analyzed, clustered.  Can someone help me to get only the countries who have
cities in the city table in a reasonable amount of time?

---

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] JOIN to a VIEW makes a real slow query

2007-02-14 Thread Chuck D.

On Tuesday 13 February 2007 14:51, Tom Lane wrote:
 Chuck D. [EMAIL PROTECTED] writes:
  It is still using that sequence scan on the view after the APPEND for the
  us_city and world_city table.  Any reason why the view won't use the
  indexes when it is JOINed to another table but it will when the view is
  queried without a JOIN?  I should have mentioned this is v8.1.4.

 8.1 isn't bright enough for that.  Should work in 8.2 though.


   regards, tom lane

Upgraded to 8.2.3 in my spare time here - went from the packaged binary that 
came with Ubuntu to compiling from source.  Haven't tuned it yet, but what do 
you think about this join on the view?


cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# LEFT JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
   QUERY PLAN

 Nested Loop Left Join  (cost=0.00..17.76 rows=10614 width=486) (actual 
time=0.109..0.113 rows=1 loops=1)
   Join Filter: (mu.city_id = ci.city_id)
   -  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=72) (actual 
time=0.015..0.017 rows=1 loops=1)
   -  Append  (cost=0.00..16.72 rows=2 width=422) (actual time=0.073..0.075 
rows=1 loops=1)
 -  Index Scan using pk_us_city on us_city  (cost=0.00..8.28 rows=1 
width=222) (actual time=0.032..0.032 rows=0 loops=1)
   Index Cond: (mu.city_id = us_city.city_id)
 -  Index Scan using world_city_pk on world_city  (cost=0.00..8.44 
rows=1 width=422) (actual time=0.040..0.042 rows=1 loops=1)
   Index Cond: (mu.city_id = world_city.city_id)
 Total runtime: 0.359 ms
(9 rows)


From 65 seconds down to less than 1 ms.  Pretty good huh?  Nice call Tom.  

Now I'll have to find some time to do the production server before this app 
goes up.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] JOIN to a VIEW makes a real slow query

2007-02-13 Thread Chuck D.
Hi folks,

I don't know if this is an SQL or PERFORMANCE list problem but I wanted to 
check here first.  I've seen this discussed on the list before but I'm still 
not sure of the solution.  Maybe my query is just structured wrong.

I recently visited an old project of mine that has a 'city', 'state,' 
and 'country' tables.  The city data comes from multiple sources and totals 
about 3 million rows.  I decided to split the city table up based on the 
source (world_city, us_city).  This makes easier updating because the 
assigned feature id's from the two sources overlap in some cases making it 
impossible to update as a single merged table.

However, I decided to create a view to behave like the old 'city' table.  The 
view is just a simple:

SELECT [columns]
FROM world_city
UNION
SELECT [columns]
FROM us_city
;

Selecting from the view is very quick, but JOINing to the view is slow.  About 
65 seconds to select a city.  It doesn't matter wether it is joined to one 
table or 6 like it is in my user_detail query - it is still slow.  It has 
indexes on the city_id, state_id, country_id of each table in the view too.  
Everything has been 'VACUUM ANALYZE' ed.

When using explain analyze from the view I get this:

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# left JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
QUERY PLAN
--
 Nested Loop Left Join  (cost=650146.58..751018.45 rows=10618 width=55) 
(actual time=53078.261..61269.190 rows=1 loops=1)
   Join Filter: (outer.city_id = inner.?column1?)
   -  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=27) (actual 
time=0.010..0.022 rows=1 loops=1)
   -  Unique  (cost=650146.58..703236.51 rows=2123597 width=62) (actual 
time=49458.007..59635.140 rows=2122712 loops=1)
 -  Sort  (cost=650146.58..655455.58 rows=2123597 width=62) (actual 
time=49458.003..55405.965 rows=2122712 loops=1)
   Sort Key: city_id, state_id, country_id, cc1, rc, adm1, lat, 
lon, city_name
   -  Append  (cost=0.00..73741.94 rows=2123597 width=62) (actual 
time=18.835..13706.395 rows=2122712 loops=1)
 -  Seq Scan on us_city  (cost=0.00..4873.09 rows=169409 
width=62) (actual time=18.832..620.553 rows=169398 loops=1)
 -  Seq Scan on world_city  (cost=0.00..47632.88 
rows=1954188 width=61) (actual time=23.513..11193.341 rows=1953314 loops=1)
 Total runtime: 61455.471 ms
(10 rows)

Time: 61512.377 ms

So, a sequence scan on the tables in the view, won't use the index.

Then do the same query by replacing the view with the real table:

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# left JOIN geo.world_city AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
  QUERY PLAN
--
 Nested Loop Left Join  (cost=0.00..4.04 rows=1 width=36) (actual 
time=53.854..53.871 rows=1 loops=1)
   -  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=27) (actual 
time=0.010..0.016 rows=1 loops=1)
   -  Index Scan using world_city_pk on world_city ci  (cost=0.00..3.01 
rows=1 width=17) (actual time=53.825..53.833 rows=1 loops=1)
 Index Cond: (outer.city_id = ci.city_id)
 Total runtime: 53.989 ms
(5 rows)

Time: 56.234 ms


I'm not sure that a view on a UNION is the best idea but I don't know how to 
go about keeping the tables from the data sources with the view (other than 
modifying them with a source_id column).  Any ideas on what is causing the 
performance lag?



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] JOIN to a VIEW makes a real slow query

2007-02-13 Thread Chuck D.
On Tuesday 13 February 2007 13:16, Merlin Moncure wrote:

 use 'union all' instead of union.  union without all has an implied
 sort and duplicate removal step that has to be resolved, materializing
 the view, before you can join to it.


Thanks for that Merlin, I forgot about using ALL.  That does eliminate the 
UNIQUE, SORT and SORT lines from the EXPLAIN query.  It also brings the query 
time down from a whopping 65 seconds to 11 seconds.  The two tables contain 
unique rows already so ALL would be required.

It is still using that sequence scan on the view after the APPEND for the 
us_city and world_city table.  Any reason why the view won't use the indexes 
when it is JOINed to another table but it will when the view is queried 
without a JOIN?  I should have mentioned this is v8.1.4.

Also, does anyone know why this line:
Join Filter: (outer.city_id = inner.?column1?)
... contains ?column1? instead of the actual column name?

This is the result after UNION ALL on the view

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# LEFT JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
 QUERY PLAN
-
 Nested Loop Left Join  (cost=0.00..121523.88 rows=10618 width=55) (actual 
time=2392.376..11061.117 rows=1 loops=1)
   Join Filter: (outer.city_id = inner.?column1?)
   -  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=27) (actual 
time=0.025..0.028 rows=1 loops=1)
   -  Append  (cost=0.00..73741.94 rows=2123597 width=62) (actual 
time=16.120..9644.315 rows=2122712 loops=1)
 -  Seq Scan on us_city  (cost=0.00..4873.09 rows=169409 width=62) 
(actual time=16.119..899.802 rows=169398 loops=1)
 -  Seq Scan on world_city  (cost=0.00..47632.88 rows=1954188 
width=61) (actual time=10.585..6949.946 rows=1953314 loops=1)
 Total runtime: 11061.441 ms
(7 rows)


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings