Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-02 Thread Jeff Janes
On Fri, Aug 2, 2013 at 2:58 AM, Sergey Burladyan eshkin...@gmail.com wrote:

 PS: I think my main problem is here:
 select min(user_id) from items;
  min
 -
1
 (1 row)

 Time: 504.520 ms

That is a long time, but still 100 fold less than the planner is taking.

What about max(user_id)?


 also, i cannot reindex it concurrently now, because it run autovacuum: VACUUM 
 ANALYZE public.items (to prevent wraparound)

That is going to take a long time if you have the cost settings at
their defaults.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-02 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 On Thu, Aug 1, 2013 at 5:16 PM, Sergey Burladyan eshkin...@gmail.com wrote:
 If I not mistaken, may be two code paths like this here:
 (1) mergejoinscansel - scalarineqsel- ineq_histogram_selectivity - 
 get_actual_variable_range - index_getnext
 (2) scalargtsel - scalarineqsel - ineq_histogram_selectivity - 
 get_actual_variable_range - index_getnext

 Yeah, I think you are correct.

mergejoinscansel does *not* call scalarineqsel, nor get_actual_variable_range.
It calls get_variable_range, which only looks at the pg_statistic entries.

I think we need to see the actual stack traces, not incomplete versions.
It's possible that the situation here involves bloat in pg_statistic, but
we're just leaping to conclusions if we assume that that's where the index
fetches are occurring.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-02 Thread Jeff Janes
On Thu, Aug 1, 2013 at 5:16 PM, Sergey Burladyan eshkin...@gmail.com wrote:
 I also find this trace for other query:
 explain select * from xview.user_items_v v where ( v.item_id = 132358330 );


 If I not mistaken, may be two code paths like this here:
 (1) mergejoinscansel - scalarineqsel- ineq_histogram_selectivity - 
 get_actual_variable_range - index_getnext
 (2) scalargtsel - scalarineqsel - ineq_histogram_selectivity - 
 get_actual_variable_range - index_getnext

Yeah, I think you are correct.

 And may be get_actual_variable_range() function is too expensive for
 call with my bloated table items with bloated index items_user_id_idx on it?

But why is it bloated in this way?  It must be visiting many thousands
of dead/invisible rows before finding the first visible one.  But,
Btree index have a mechanism to remove dead tuples from indexes, so it
doesn't follow them over and over again (see kill_prior_tuple).  So
is that mechanism not working, or are the tuples not dead but just
invisible (i.e. inserted by a still open transaction)?

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-02 Thread Sergey Burladyan
Tom Lane t...@sss.pgh.pa.us writes:

 Jeff Janes jeff.ja...@gmail.com writes:
  On Thu, Aug 1, 2013 at 5:16 PM, Sergey Burladyan eshkin...@gmail.com 
  wrote:
  If I not mistaken, may be two code paths like this here:
  (1) mergejoinscansel - scalarineqsel- ineq_histogram_selectivity - 
  get_actual_variable_range - index_getnext
  (2) scalargtsel - scalarineqsel - ineq_histogram_selectivity - 
  get_actual_variable_range - index_getnext

  Yeah, I think you are correct.

 mergejoinscansel does *not* call scalarineqsel, nor get_actual_variable_range.
 It calls get_variable_range, which only looks at the pg_statistic
 entries.

Hmm, I speak about 9.2.2 but in current HEAD this call still exist,
please see: http://doxygen.postgresql.org/selfuncs_8c_source.html#l02976

 I think we need to see the actual stack traces, not incomplete versions.
 It's possible that the situation here involves bloat in pg_statistic, but
 we're just leaping to conclusions if we assume that that's where the index
 fetches are occurring.

I found debug symbols and send stack trace to mail list, but it blocked
by size, try again with zip



gdb.log.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-02 Thread Sergey Burladyan
Jeff Janes jeff.ja...@gmail.com writes:

 On Fri, Aug 2, 2013 at 2:58 AM, Sergey Burladyan eshkin...@gmail.com wrote:
 
  PS: I think my main problem is here:
  select min(user_id) from items;
   min
  -
 1
  (1 row)
 
  Time: 504.520 ms

 That is a long time, but still 100 fold less than the planner is taking.

 What about max(user_id)?

max is good, only rows with user_id = 0 was updated:

select max(user_id) from items;
Time: 59.646 ms

  also, i cannot reindex it concurrently now, because it run autovacuum: 
  VACUUM ANALYZE public.items (to prevent wraparound)

 That is going to take a long time if you have the cost settings at
 their defaults.

Yes, I have custom setting, more slow, it will last about a week.

 But why is it bloated in this way? 

Don't known. It has been updated many items last week. ~ 10% of table.

 It must be visiting many thousands of dead/invisible rows before
 finding the first visible one.  But, Btree index have a mechanism to
 remove dead tuples from indexes, so it doesn't follow them over and
 over again (see kill_prior_tuple).  So is that mechanism not
 working, or are the tuples not dead but just invisible (i.e. inserted
 by a still open transaction)?

It is deleted, but VACUUM still not completed.

BTW, it is standby server, and it query plan (block read) is very
different from master:

Hot standby:

explain (analyze,verbose,buffers) select min(user_id) from items;

'Result  (cost=0.12..0.13 rows=1 width=0) (actual time=56064.514..56064.514 
rows=1 loops=1)'
'  Output: $0'
'  Buffers: shared hit=3694164 read=6591224 written=121652'
'  InitPlan 1 (returns $0)'
'-  Limit  (cost=0.00..0.12 rows=1 width=8) (actual 
time=56064.502..56064.503 rows=1 loops=1)'
'  Output: public.items.user_id'
'  Buffers: shared hit=3694164 read=6591224 written=121652'
'  -  Index Only Scan using items_user_id_idx on public.items  
(cost=0.00..24165743.48 rows=200673143 width=8) (actual 
time=56064.499..56064.499 rows=1 loops=1)'
'Output: public.items.user_id'
'Index Cond: (public.items.user_id IS NOT NULL)'
'Heap Fetches: 8256426'
'Buffers: shared hit=3694164 read=6591224 written=121652'
'Total runtime: 56064.571 ms'

Master:

'Result  (cost=0.12..0.13 rows=1 width=0) (actual time=202.759..202.759 rows=1 
loops=1)'
'  Output: $0'
'  Buffers: shared hit=153577 read=1'
'  InitPlan 1 (returns $0)'
'-  Limit  (cost=0.00..0.12 rows=1 width=8) (actual time=202.756..202.757 
rows=1 loops=1)'
'  Output: public.items.user_id'
'  Buffers: shared hit=153577 read=1'
'  -  Index Only Scan using items_user_id_idx on public.items  
(cost=0.00..24166856.02 rows=200680528 width=8) (actual time=202.756..202.756 
rows=1 loops=1)'
'Output: public.items.user_id'
'Index Cond: (public.items.user_id IS NOT NULL)'
'Heap Fetches: 0'
'Buffers: shared hit=153577 read=1'
'Total runtime: 202.786 ms'

And from backup, before index|heap bloated :)

 Result  (cost=0.87..0.88 rows=1 width=0) (actual time=16.002..16.003 rows=1 
loops=1)
   Output: $0
   Buffers: shared hit=3 read=4
   InitPlan 1 (returns $0)
 -  Limit  (cost=0.00..0.87 rows=1 width=8) (actual time=15.993..15.995 
rows=1 loops=1)
   Output: public.items.user_id
   Buffers: shared hit=3 read=4
   -  Index Only Scan using items_user_id_idx on public.items  
(cost=0.00..169143085.72 rows=193309210 width=8) (actual time=15.987..15.987 
rows=1 loops=1)
 Output: public.items.user_id
 Index Cond: (public.items.user_id IS NOT NULL)
 Heap Fetches: 1
 Buffers: shared hit=3 read=4
 Total runtime: 16.057 ms


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-02 Thread Alvaro Herrera
Tom Lane escribió:
 Jeff Janes jeff.ja...@gmail.com writes:
  On Thu, Aug 1, 2013 at 5:16 PM, Sergey Burladyan eshkin...@gmail.com 
  wrote:
  If I not mistaken, may be two code paths like this here:
  (1) mergejoinscansel - scalarineqsel- ineq_histogram_selectivity - 
  get_actual_variable_range - index_getnext
  (2) scalargtsel - scalarineqsel - ineq_histogram_selectivity - 
  get_actual_variable_range - index_getnext
 
  Yeah, I think you are correct.
 
 mergejoinscansel does *not* call scalarineqsel, nor get_actual_variable_range.
 It calls get_variable_range, which only looks at the pg_statistic entries.

Uh?  It's right there in line 2976 in HEAD.


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-02 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane escribió:
 It calls get_variable_range, which only looks at the pg_statistic entries.

 Uh?  It's right there in line 2976 in HEAD.

Meh.  You're right, I was thinking of this bit in get_variable_range()

/*
 * XXX It's very tempting to try to use the actual column min and max, if
 * we can get them relatively-cheaply with an index probe.  However, since
 * this function is called many times during join planning, that could
 * have unpleasant effects on planning speed.  Need more investigation
 * before enabling this.
 */
#ifdef NOT_USED
if (get_actual_variable_range(root, vardata, sortop, min, max))
return true;
#endif

I think when that was written, we didn't have the code in scalarineqsel
that tries to go out and get the actual endpoints from an index.  Now
that we do, the planning cost impact that I was afraid of here can
actually bite us, and it seems that at least for Sergey's case it's pretty
bad.  Another problem is that we'll end up comparing endpoints gotten from
pg_statistic to endpoints gotten from the index, making the resulting
numbers at least self-inconsistent and very possibly meaningless.

The planner already caches the results of mergejoinscansel in hopes of
alleviating its cost, but I wonder if we need another lower-level cache
for the min/max values of each variable that participates in a
mergejoinable clause.

Having said that, it's still not clear why these probes are so expensive
in Sergey's case.  I favor your idea about lots of dead rows, but we don't
have actual proof of it.  Maybe pgstattuple could help here?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-02 Thread Sergey Burladyan
Sergey Burladyan eshkin...@gmail.com writes:

 Hot standby:
...
 '  -  Index Only Scan using items_user_id_idx on public.items  
 (cost=0.00..24165743.48 rows=200673143 width=8) (actual 
 time=56064.499..56064.499 rows=1 loops=1)'
 'Output: public.items.user_id'
 'Index Cond: (public.items.user_id IS NOT NULL)'
 'Heap Fetches: 8256426'
 'Buffers: shared hit=3694164 read=6591224 written=121652'
 'Total runtime: 56064.571 ms'

 Master:

...
 '  -  Index Only Scan using items_user_id_idx on public.items  
 (cost=0.00..24166856.02 rows=200680528 width=8) (actual time=202.756..202.756 
 rows=1 loops=1)'
 'Output: public.items.user_id'
 'Index Cond: (public.items.user_id IS NOT NULL)'
 'Heap Fetches: 0'
 'Buffers: shared hit=153577 read=1'
 'Total runtime: 202.786 ms'

Looks like visibility map is not replicated into slave somehow?

If it matters, Master was restarted yesterday, Standby was not.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
Hello, i have a problem with planning time, I do not understand why this can
happen.

PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real
(Debian 4.4.5-8) 4.4.5, 64-bit

# explain
# select i.item_id, u.user_id from items i
# left join users u on u.user_id = i.user_id
# where item_id = 169946840;
  QUERY PLAN

--
 Nested Loop Left Join  (cost=0.00..397.14 rows=1 width=16)
   -  Index Scan using items_item_ux on items i  (cost=0.00..358.84 rows=1
width=16)
 Index Cond: (item_id = 169946840)
   -  Index Only Scan using users_user_id_pkey on users u
 (cost=0.00..38.30 rows=1 width=8)
 Index Cond: (user_id = i.user_id)

time: 55919.910 ms

# set enable_mergejoin to off;

# explain
select i.item_id, u.user_id from items i
left join users u on u.user_id = i.user_id
where item_id = 169946840;
  QUERY PLAN

--
 Nested Loop Left Join  (cost=0.00..397.14 rows=1 width=16)
   -  Index Scan using items_item_ux on items i  (cost=0.00..358.84 rows=1
width=16)
 Index Cond: (item_id = 169946840)
   -  Index Only Scan using users_user_id_pkey on users u
 (cost=0.00..38.30 rows=1 width=8)
 Index Cond: (user_id = i.user_id)

time: 28.874 ms

-- 
Sergey Burladyan


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Thomas Reiss
Le 01/08/2013 11:55, Sergey Burladyan a écrit :
 Hello, i have a problem with planning time, I do not understand why this
 can happen.
 
 PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real
 (Debian 4.4.5-8) 4.4.5, 64-bit
 
 # explain
 # select i.item_id, u.user_id from items i
 # left join users u on u.user_id = i.user_id
 # where item_id = 169946840;
   QUERY PLAN
  
 --
  Nested Loop Left Join  (cost=0.00..397.14 rows=1 width=16)
-  Index Scan using items_item_ux on items i  (cost=0.00..358.84
 rows=1 width=16)
  Index Cond: (item_id = 169946840)
-  Index Only Scan using users_user_id_pkey on users u
  (cost=0.00..38.30 rows=1 width=8)
  Index Cond: (user_id = i.user_id)
 
 time: 55919.910 ms
 
 # set enable_mergejoin to off;
 
 # explain
 select i.item_id, u.user_id from items i
 left join users u on u.user_id = i.user_id
 where item_id = 169946840;
   QUERY PLAN
  
 --
  Nested Loop Left Join  (cost=0.00..397.14 rows=1 width=16)
-  Index Scan using items_item_ux on items i  (cost=0.00..358.84
 rows=1 width=16)
  Index Cond: (item_id = 169946840)
-  Index Only Scan using users_user_id_pkey on users u
  (cost=0.00..38.30 rows=1 width=8)
  Index Cond: (user_id = i.user_id)
 
 time: 28.874 ms
 
 -- 
 Sergey Burladyan

Hello,

If you leave enable_mergejoin to on, what happens if you run the explain
two time in a row ? Do you get the same planning time ?

At first look, this reminds me some catalog bloat issue. Can you provide
the result of these queries :
SELECT pg_size_pretty(pg_table_size('pg_class')) AS size_pg_class;
SELECT pg_size_pretty(pg_table_size('pg_attribute')) AS size_pg_attribute;

Thanks
-- 
Thomas Reiss
Consultant Dalibo
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
01.08.2013 14:05 пользователь Thomas Reiss thomas.re...@dalibo.com
написал:

 If you leave enable_mergejoin to on, what happens if you run the explain
 two time in a row ? Do you get the same planning time ?

Yes, I get the same planning time.


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
On Thu, Aug 1, 2013 at 2:04 PM, Thomas Reiss thomas.re...@dalibo.comwrote:

 Le 01/08/2013 11:55, Sergey Burladyan a écrit :
 At first look, this reminds me some catalog bloat issue. Can you provide
 the result of these queries :
 SELECT pg_size_pretty(pg_table_size('pg_class')) AS size_pg_class;
 SELECT pg_size_pretty(pg_table_size('pg_attribute')) AS size_pg_attribute;


SELECT pg_size_pretty(pg_table_size('pg_class')) AS size_pg_class; --- '16
MB'
SELECT pg_size_pretty(pg_table_size('pg_attribute')) AS size_pg_attribute;
--- '63 MB'

-- 
Sergey Burladyan


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
I find another query with big planning time:
explain select * from xview.user_items_v v where ( v.item_id = 132358330 );
QUERY PLAN

---
 Nested Loop Left Join  (cost=0.00..363.28 rows=1 width=44)
   Join Filter: (ief.item_id = ix.item_id)
   -  Index Scan using items_item_ux on items ix  (cost=0.00..359.20
rows=1 width=36)
 Index Cond: (item_id = 132358330)
 Filter: ((xa_txtime IS NULL) AND (user_id  0) AND (status_id 
20))
   -  Index Scan using item_enabled_flags_item_id_idx on
item_enabled_flags ief  (cost=0.00..4.06 rows=1 width=8)
 Index Cond: (item_id = 132358330)
(7 rows)

Time: 44037.758 ms

looks like planning algorithm hang on 'items' table statistics. Setting
enable_mergejoin to off does not help with this query.

-- 
Sergey Burladyan


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
Sergey Burladyan eshkin...@gmail.com writes:

 # explain
 # select i.item_id, u.user_id from items i
 # left join users u on u.user_id = i.user_id
 # where item_id = 169946840;
 QUERY PLAN 
 --
 Nested Loop Left Join (cost=0.00..397.14 rows=1 width=16)
 - Index Scan using items_item_ux on items i (cost=0.00..358.84 rows=1 
 width=16)
 Index Cond: (item_id = 169946840)
 - Index Only Scan using users_user_id_pkey on users u (cost=0.00..38.30 
 rows=1 width=8)
 Index Cond: (user_id = i.user_id)

 time: 55919.910 ms

While running this EXPLAIN backend use disk for a long time:
 TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IOCOMMAND   

 
21638 be/4 postgres2.10 M/s9.45 M/s  0.00 % 69.04 % postgres: postgres 
x xxx.xxx.xxx.xxx(50987) EXPLAIN

Why it read and write to disk 10 megabytes per second for EXPLAIN query? Cannot 
understand what is going on here :(



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread David Kerr
On Thu, Aug 01, 2013 at 07:17:27PM +0400, Sergey Burladyan wrote:
- Sergey Burladyan eshkin...@gmail.com writes:
- 
-  # explain
-  # select i.item_id, u.user_id from items i
-  # left join users u on u.user_id = i.user_id
-  # where item_id = 169946840;
-  QUERY PLAN 
-  
--
-  Nested Loop Left Join (cost=0.00..397.14 rows=1 width=16)
-  - Index Scan using items_item_ux on items i (cost=0.00..358.84 rows=1 
width=16)
-  Index Cond: (item_id = 169946840)
-  - Index Only Scan using users_user_id_pkey on users u (cost=0.00..38.30 
rows=1 width=8)
-  Index Cond: (user_id = i.user_id)
- 
-  time: 55919.910 ms
- 
- While running this EXPLAIN backend use disk for a long time:
-  TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IOCOMMAND 

   
- 21638 be/4 postgres2.10 M/s9.45 M/s  0.00 % 69.04 % postgres: 
postgres x xxx.xxx.xxx.xxx(50987) EXPLAIN
- 
- Why it read and write to disk 10 megabytes per second for EXPLAIN query? 
Cannot understand what is going on here :(


That sounds familiar - is it possible you're running into this?
http://www.postgresql.org/message-id/20120713065122.ga45...@mr-paradox.net


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Jeff Janes
On Thu, Aug 1, 2013 at 8:17 AM, Sergey Burladyan eshkin...@gmail.com wrote:
 Sergey Burladyan eshkin...@gmail.com writes:

 # explain
 # select i.item_id, u.user_id from items i
 # left join users u on u.user_id = i.user_id
 # where item_id = 169946840;
 QUERY PLAN
 --
 Nested Loop Left Join (cost=0.00..397.14 rows=1 width=16)
 - Index Scan using items_item_ux on items i (cost=0.00..358.84 rows=1 
 width=16)
 Index Cond: (item_id = 169946840)
 - Index Only Scan using users_user_id_pkey on users u (cost=0.00..38.30 
 rows=1 width=8)
 Index Cond: (user_id = i.user_id)

 time: 55919.910 ms

 While running this EXPLAIN backend use disk for a long time:
  TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IOCOMMAND
 21638 be/4 postgres2.10 M/s9.45 M/s  0.00 % 69.04 % postgres: 
 postgres x xxx.xxx.xxx.xxx(50987) EXPLAIN

 Why it read and write to disk 10 megabytes per second for EXPLAIN query? 
 Cannot understand what is going on here :(

I'd use strace to find what file handle is being read and written, and
lsof to figure out what file that is.

It looks like it is more write than read, which does seem strange.

Any chance you can create a self-contained test case?

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
Jeff Janes jeff.ja...@gmail.com writes:

 I'd use strace to find what file handle is being read and written, and
 lsof to figure out what file that is.

I use strace, it is more read then write:
$ cut -d '(' -f 1 /var/tmp/pg.trace | sort | uniq -c | sort -n
 49 select
708 close
   1021 open
   7356 write
 212744 read
 219650 lseek

top reads:
7859 read(150 open(base/16444/17685.129, O_RDWR|O_CREAT, 0600) = 150
9513 read(149 open(base/16444/17685.128, O_RDWR|O_CREAT, 0600) = 149
10529 read(151 open(base/16444/17685.130, O_RDWR|O_CREAT, 0600) = 151
12155 read(152 open(base/16444/17685.131, O_RDWR|O_CREAT, 0600) = 152
12768 read(154 open(base/16444/17685.133, O_RDWR|O_CREAT, 0600) = 154
16210 read(153 open(base/16444/17685.132, O_RDWR|O_CREAT, 0600) = 153

it is 'items' table:
select relname from pg_class where relfilenode = 17685;
 relname 
-
 items

each read is 8192 bytes, so for EXPLAIN query with two simple index scan, 
*without* ANALYZE postgres
read (7859 + 9513 + 10529 + 12155 + 12768 + 16210) * 8192 = 565 526 528 bytes 
from it.

 It looks like it is more write than read, which does seem strange.

Why it read something for simple EXPLAIN, without real executing query? :-)

 Any chance you can create a self-contained test case?

I think I cannot do this, it is ~1 Tb heavily load database. This is at standby 
server.

PS: two strace for quick and slow explain:

explain
select i.item_id from items i
where item_id = 169946840

$ cut -d '(' -f 1 /var/tmp/pg-all-normal.trace | sort | uniq -c
313 lseek
308 open
  2 read
 13 recvfrom
  6 sendto

explain
select i.item_id, u.user_id from items i
left join users u on u.user_id = i.user_id
where item_id = 169946840

$ cut -d '(' -f 1 /var/tmp/pg-all-slow.trace | sort | uniq -c
963 close
  1 fsync
5093393 lseek
925 open
6004995 read
 14 recvfrom
  1 rt_sigreturn
  9 select
   4361 semop
  7 sendto
  1 --- SIGUSR1 
 685605 write



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Jeff Janes
On Thu, Aug 1, 2013 at 12:13 PM, Sergey Burladyan eshkin...@gmail.com wrote:
 Jeff Janes jeff.ja...@gmail.com writes:

 I'd use strace to find what file handle is being read and written, and
 lsof to figure out what file that is.

 I use strace, it is more read then write:
 $ cut -d '(' -f 1 /var/tmp/pg.trace | sort | uniq -c | sort -n
  49 select
 708 close
1021 open
7356 write
  212744 read
  219650 lseek

Based on your iotop (or whatever that was that you posted previously)
most of the reads must be coming from the file system cache.


 top reads:
 7859 read(150 open(base/16444/17685.129, O_RDWR|O_CREAT, 0600) = 150
 9513 read(149 open(base/16444/17685.128, O_RDWR|O_CREAT, 0600) = 149
 10529 read(151 open(base/16444/17685.130, O_RDWR|O_CREAT, 0600) = 151
 12155 read(152 open(base/16444/17685.131, O_RDWR|O_CREAT, 0600) = 152
 12768 read(154 open(base/16444/17685.133, O_RDWR|O_CREAT, 0600) = 154
 16210 read(153 open(base/16444/17685.132, O_RDWR|O_CREAT, 0600) = 153

 it is 'items' table:
 select relname from pg_class where relfilenode = 17685;
  relname
 -
  items

 each read is 8192 bytes, so for EXPLAIN query with two simple index scan, 
 *without* ANALYZE postgres
 read (7859 + 9513 + 10529 + 12155 + 12768 + 16210) * 8192 = 565 526 528 bytes 
 from it.

 It looks like it is more write than read, which does seem strange.

 Why it read something for simple EXPLAIN, without real executing query? :-)

I figured it was reading some system catalogs or something.  I don't
know why it would be reading the table files. Or writing much of
anything, either.

I think the next step would be to run gdb -p pid (but don't start
gdb until backend is in the middle of a slow explain), then:

break read
c
bt

Then repeat the c and bt combination a few more times, to build up a
dataset on what the call stack is which is causing the reads to
happen.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
Jeff Janes jeff.ja...@gmail.com writes:

 I think the next step would be to run gdb -p pid (but don't start
 gdb until backend is in the middle of a slow explain), then:

Sorry, I am lack debug symbols, so call trace is incomplete:

explain select i.item_id, u.user_id from items i left join users u on u.user_id 
= i.user_id where item_id = 169946840

#0  0x7ff766967620 in read () from /lib/libc.so.6
#1  0x7ff7689cfc25 in FileRead ()
#2  0x7ff7689ea2f6 in mdread ()
#3  0x7ff7689cc473 in ?? ()
#4  0x7ff7689ccf54 in ReadBufferExtended ()
#5  0x7ff7688050ca in index_fetch_heap ()
#6  0x7ff76880523e in index_getnext ()
#7  0x7ff768a63306 in ?? ()
#8  0x7ff768a67624 in ?? ()
#9  0x7ff768a67d9c in ?? ()
#10 0x7ff768a68376 in mergejoinscansel ()
#11 0x7ff76896faa6 in initial_cost_mergejoin ()
#12 0x7ff768977695 in ?? ()
#13 0x7ff76897816c in add_paths_to_joinrel ()
#14 0x7ff76897981b in make_join_rel ()
#15 0x7ff768979ac9 in join_search_one_level ()
#16 0x7ff76896a3ab in standard_join_search ()
#17 0x7ff7689837c1 in query_planner ()
#18 0x7ff768985260 in ?? ()
#19 0x7ff7689870a9 in subquery_planner ()
#20 0x7ff76898736e in standard_planner ()
#21 0x7ff7689ef3ce in pg_plan_query ()
#22 0x7ff7688c94a3 in ?? ()
#23 0x7ff7688c9809 in ExplainQuery ()
#24 0x7ff7648095e2 in ?? () from 
/usr/lib/postgresql/9.2/lib/pg_stat_statements.so
#25 0x7ff7689f1f27 in ?? ()
#26 0x7ff7689f3295 in ?? ()
#27 0x7ff7689f388f in PortalRun ()
#28 0x7ff7689ef96d in ?? ()
#29 0x7ff7689f0950 in PostgresMain ()
#30 0x7ff7689aa7a3 in ?? ()
#31 0x7ff7689ad73c in PostmasterMain ()
#32 0x7ff768948e4b in main ()

#0  0x7ff766973950 in lseek64 () from /lib/libc.so.6
#1  0x7ff7689cf88d in FileSeek ()
#2  0x7ff7689ea09c in mdwrite ()
#3  0x7ff7689cb12f in ?? ()
#4  0x7ff7689cca43 in ?? ()
#5  0x7ff7689ccf54 in ReadBufferExtended ()
#6  0x7ff7688050ca in index_fetch_heap ()
#7  0x7ff76880523e in index_getnext ()
#8  0x7ff768a63306 in ?? ()
#9  0x7ff768a67624 in ?? ()
#10 0x7ff768a67d9c in ?? ()
#11 0x7ff768a68376 in mergejoinscansel ()
#12 0x7ff76896faa6 in initial_cost_mergejoin ()
#13 0x7ff768977695 in ?? ()
#14 0x7ff76897816c in add_paths_to_joinrel ()
#15 0x7ff76897981b in make_join_rel ()
#16 0x7ff768979ac9 in join_search_one_level ()
#17 0x7ff76896a3ab in standard_join_search ()
#18 0x7ff7689837c1 in query_planner ()
#19 0x7ff768985260 in ?? ()
#20 0x7ff7689870a9 in subquery_planner ()
#21 0x7ff76898736e in standard_planner ()
#22 0x7ff7689ef3ce in pg_plan_query ()
#23 0x7ff7688c94a3 in ?? ()
#24 0x7ff7688c9809 in ExplainQuery ()
#25 0x7ff7648095e2 in ?? () from 
/usr/lib/postgresql/9.2/lib/pg_stat_statements.so
#26 0x7ff7689f1f27 in ?? ()
#27 0x7ff7689f3295 in ?? ()
#28 0x7ff7689f388f in PortalRun ()
#29 0x7ff7689ef96d in ?? ()
#30 0x7ff7689f0950 in PostgresMain ()
#31 0x7ff7689aa7a3 in ?? ()
#32 0x7ff7689ad73c in PostmasterMain ()
#33 0x7ff768948e4b in main ()

#0  0x7ff766973950 in lseek64 () from /lib/libc.so.6
#1  0x7ff7689cf88d in FileSeek ()
#2  0x7ff7689ea2b9 in mdread ()
#3  0x7ff7689cc473 in ?? ()
#4  0x7ff7689ccf54 in ReadBufferExtended ()
#5  0x7ff7688050ca in index_fetch_heap ()
#6  0x7ff76880523e in index_getnext ()
#7  0x7ff768a63306 in ?? ()
#8  0x7ff768a67624 in ?? ()
#9  0x7ff768a67d9c in ?? ()
#10 0x7ff768a68376 in mergejoinscansel ()
#11 0x7ff76896faa6 in initial_cost_mergejoin ()
#12 0x7ff768977695 in ?? ()
#13 0x7ff76897816c in add_paths_to_joinrel ()
#14 0x7ff76897981b in make_join_rel ()
#15 0x7ff768979ac9 in join_search_one_level ()
#16 0x7ff76896a3ab in standard_join_search ()
#17 0x7ff7689837c1 in query_planner ()
#18 0x7ff768985260 in ?? ()
#19 0x7ff7689870a9 in subquery_planner ()
#20 0x7ff76898736e in standard_planner ()
#21 0x7ff7689ef3ce in pg_plan_query ()
#22 0x7ff7688c94a3 in ?? ()
#23 0x7ff7688c9809 in ExplainQuery ()
#24 0x7ff7648095e2 in ?? () from 
/usr/lib/postgresql/9.2/lib/pg_stat_statements.so
#25 0x7ff7689f1f27 in ?? ()
#26 0x7ff7689f3295 in ?? ()
#27 0x7ff7689f388f in PortalRun ()
#28 0x7ff7689ef96d in ?? ()
#29 0x7ff7689f0950 in PostgresMain ()
#30 0x7ff7689aa7a3 in ?? ()
#31 0x7ff7689ad73c in PostmasterMain ()
#32 0x7ff768948e4b in main ()


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
I also find this trace for other query:
explain select * from xview.user_items_v v where ( v.item_id = 132358330 );

#0  0x7ff766967620 in read () from /lib/libc.so.6
#1  0x7ff7689cfc25 in FileRead ()
#2  0x7ff7689ea2f6 in mdread ()
#3  0x7ff7689cc473 in ?? ()
#4  0x7ff7689ccf54 in ReadBufferExtended ()
#5  0x7ff7688050ca in index_fetch_heap ()
#6  0x7ff76880523e in index_getnext ()
#7  0x7ff768a63306 in ?? ()
#8  0x7ff768a67624 in ?? ()
#9  0x7ff768a67d9c in ?? ()
#10 0x7ff768a688fc in scalargtsel ()
#11 0x7ff768ac5211 in OidFunctionCall4Coll ()
#12 0x7ff768998ce5 in restriction_selectivity ()
#13 0x7ff76896c71e in clause_selectivity ()
#14 0x7ff76896bf60 in clauselist_selectivity ()
#15 0x7ff76896ddfd in set_baserel_size_estimates ()
#16 0x7ff76896abf2 in ?? ()
#17 0x7ff76896bc97 in make_one_rel ()
#18 0x7ff7689837c1 in query_planner ()
#19 0x7ff768985260 in ?? ()
#20 0x7ff7689870a9 in subquery_planner ()
#21 0x7ff76898736e in standard_planner ()
#22 0x7ff7689ef3ce in pg_plan_query ()
#23 0x7ff7688c94a3 in ?? ()
#24 0x7ff7688c9809 in ExplainQuery ()
#25 0x7ff7648095e2 in ?? () from 
/usr/lib/postgresql/9.2/lib/pg_stat_statements.so
#26 0x7ff7689f1f27 in ?? ()
#27 0x7ff7689f3295 in ?? ()
#28 0x7ff7689f388f in PortalRun ()
#29 0x7ff7689ef96d in ?? ()
#30 0x7ff7689f0950 in PostgresMain ()
#31 0x7ff7689aa7a3 in ?? ()
#32 0x7ff7689ad73c in PostmasterMain ()
#33 0x7ff768948e4b in main ()

I see two code paths:
#6  0x7ff76880523e in index_getnext ()
...
#9  0x7ff768a67d9c in ?? ()
#10 0x7ff768a688fc in scalargtsel ()
...

and 

#6  0x7ff76880523e in index_getnext ()
...
#9  0x7ff768a67d9c in ?? ()
#10 0x7ff768a68376 in mergejoinscansel ()
...

If I not mistaken, may be two code paths like this here:
(1) mergejoinscansel - scalarineqsel- ineq_histogram_selectivity - 
get_actual_variable_range - index_getnext
(2) scalargtsel - scalarineqsel - ineq_histogram_selectivity - 
get_actual_variable_range - index_getnext

And may be get_actual_variable_range() function is too expensive for
call with my bloated table items with bloated index items_user_id_idx on it?



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Alvaro Herrera
Sergey Burladyan escribió:
 I also find this trace for other query:
 explain select * from xview.user_items_v v where ( v.item_id = 132358330 );
 
 #0  0x7ff766967620 in read () from /lib/libc.so.6
 #1  0x7ff7689cfc25 in FileRead ()
 #2  0x7ff7689ea2f6 in mdread ()
 #3  0x7ff7689cc473 in ?? ()
 #4  0x7ff7689ccf54 in ReadBufferExtended ()
 #5  0x7ff7688050ca in index_fetch_heap ()
 #6  0x7ff76880523e in index_getnext ()
 #7  0x7ff768a63306 in ?? ()
 #8  0x7ff768a67624 in ?? ()
 #9  0x7ff768a67d9c in ?? ()
 #10 0x7ff768a688fc in scalargtsel ()

It'd be useful to see what's in frames 7-9, but this might be related to
get_actual_variable_range().  I don't see anything else nearby that
would try to read portions of the table.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance