Re: [PERFORM] index scan forward vs backward = speed difference of 357X slower!

2012-02-08 Thread Kevin Traster
> This is not a problem with dead rows, but the index is not really
> satisfying your query and the database has to look through an
> indeterminate amount of rows until the 'limit 15' is satisfied.  Yeah,
> backwards scans are slower, especially for disk bound scans but you
> also have to consider how many filter misses your have.  The smoking
> gun is here:
>
> "Index Scan Backward using changes_shareschange on changes
> (cost=0.00..925150.26 rows=181997 width=98) (actual time=3.161..15.843
> rows=15 loops=1)
> Filter: ((activity = ANY ('{4,5}'::integer[])) AND (mfiled >= $1))"
>
> When you see Filter: xyz, xyz is what each record has to be compared
> against after the index pointed you to an area(s) in the heap.  It's
> pure luck going forwards or backwards that determines how many records
> you have to look through to get 15 good ones as defined by satisfying
> the filter.  To prove that one way or the other you can convert your
> where to a boolean returning (and bump the limit appropriately)
> expression to see how many records get filtered out.
>
> merlin

I have indexes also on activity and mfiled (both btree) - wouldn't the
database use them? - Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index scan forward vs backward = speed difference of 357X slower!

2012-02-08 Thread Kevin Traster
Typo: Work_mem = 32 MB

The definition for both column and index:
 shareschange  | numeric |
"changes_shareschange" btree (shareschange)

Index created using: CREATE INDEX changes_shareschange ON changes(shareschange);

The entire table is created nightly (and analyzed afterwords), and
used only for reporting - there no updates/deletes, so there shouldn't
be any dead rows in the table.
Likewise, there is no nulls in the column.

Please elaborate on:

>You haven't shown us the index definition, but I gather from
> the fact that the scan condition is just a Filter (not an Index Cond)
> that the index itself doesn't offer any clue as to whether a given row
> meets those conditions

Are you saying it is the retrieval of the physically random located 15
rows to meet the ascending condition that causes the 5 sec difference?
The table is not-clustered, so it is "random" for descending also.

The condition is shareschange ascending, I have an index for that
condition and the planner is using it.

What else can I look at?



On Wed, Feb 8, 2012 at 11:31 AM, Tom Lane  wrote:
> Kevin Traster  writes:
>> The query plan and estimates are exactly the same, except desc has index
>> scan backwards instead of index scan for changes_shareschange.
>> Yet, actual runtime performance is different by 357x slower for the
>> ascending version instead of descending.
>
> Apparently, there are some rows passing the filter condition that are
> close to the end of the index, but none that are close to the start.
> So it takes a lot longer to find the first 15 matches in one case than
> the other.  You haven't shown us the index definition, but I gather from
> the fact that the scan condition is just a Filter (not an Index Cond)
> that the index itself doesn't offer any clue as to whether a given row
> meets those conditions.  So this plan is going to be doing a lot of
> random-access heap probes until it finds a match.
>
>> Why and how do I fix it?
>
> Probably, you need an index better suited to the query condition.
> If you have one and the problem is that the planner's not choosing it,
> then this is going to take more information to resolve.
>
>                        regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] index scan forward vs backward = speed difference of 357X slower!

2012-02-07 Thread Kevin Traster
PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2
20080704 (Red Hat 4.1.2-51), 64-bit

Dedicated DB server

4GB ram

Shared_Buffers = 1 GB

Effective_cache_size = 3GB

Work_mem = 32GB

Analyze done

Queries ran multiple times, same differences/results

Default Statistics = 1000


Query (5366ms) :

explain analyze select initcap (fullname), initcap(issuer),upper(rsymbol),
initcap(industry),
activity,to_char(shareschange,'FM9,999,999,999,999,999'),sharespchange ||+
E'\%' from changes where activity in (4,5) and mfiled >= (select
max(mfiled) from changes) order by shareschange asc limit 15


Slow Ascending explain Analyze:

http://explain.depesz.com/s/zFz


Query (15ms) :

explain analyze select initcap (fullname), initcap(issuer),upper(rsymbol),
initcap(industry),
activity,to_char(shareschange,'FM9,999,999,999,999,999'),sharespchange ||+
E'\%' from changes where activity in (4,5) and mfiled >= (select
max(mfiled) from changes) order by shareschange desc limit 15


Fast descending explain analyze:

http://explain.depesz.com/s/OP7



The index: changes_shareschange is a btree index created with default
ascending order


The query plan and estimates are exactly the same, except desc has index
scan backwards instead of index scan for changes_shareschange.


Yet, actual runtime performance is different by 357x slower for the
ascending version instead of descending.


Why and how do I fix it?


Re: [PERFORM] NOT IN >2hrs vs EXCEPT < 2 sec.

2009-01-29 Thread Kevin Traster
On Wed, Jan 28, 2009 at 11:37 PM, Scott Marlowe wrote:

> On Thu, Jan 29, 2009 at 12:01 AM, Kevin Traster  wrote:
> > 2 questions:
> >
> > 1) Different costs for same actions. Doing an explain on 2 nearly
> identical
> > queries both involving the same Index scan on same table has 2 widely
> > different costs for same Index scan  303375872.86 vs. 12576.70
>
> Pretty sure this is a FAQ by now.
>
> not in and except treat nulls differently.  If you table has nullable
> fields and nulls would break your query, then not in () is a bad
> choice.  Therefore, effort to optimize had been placed into except,
> which is distinctly, symantically different from not in ().
>
> It seems like some shift in the pg community has happened where we're
> suddenly getting a lot of folks who came from a database where not in
> and except are treated the same, even though they most definitely do
> not mean the same thing.
>


Umm... No. The top of the post you quoted regards the difference between the
query "get ciknum from cik" versus get ciknum from cik where NOT IN The
only differene between the two queries is the qualification of "where ciknum
not in ".  It does not involve the difference between NOT IN versus
Except

Both queries do an Index Scan using cik_ciknum_idx and those numbers show
the different costs doing the same task.

In this case, neither table  allowes nulls in the columns, both tables have
single indexes on the columns used.

Regarding the previous posts about the same issues of PERFORMENCE between
NOT IN versus EXCEPT. There has not been any answer to explain it - just
talk about the differenences between the two results.

Yes, I can still get the results using EXCEPT but it would be nice to no why
I can't get NOT IN to complete the simple query.


Re: [PERFORM] NOT IN >2hrs vs EXCEPT < 2 sec.

2009-01-29 Thread Kevin Traster
On Wed, Jan 28, 2009 at 11:37 PM, Scott Marlowe wrote:

> On Thu, Jan 29, 2009 at 12:01 AM, Kevin Traster  wrote:
> > 2 questions:
> >
> > 1) Different costs for same actions. Doing an explain on 2 nearly
> identical
> > queries both involving the same Index scan on same table has 2 widely
> > different costs for same Index scan  303375872.86 vs. 12576.70
>
> Pretty sure this is a FAQ by now.
>
> not in and except treat nulls differently.  If you table has nullable
> fields and nulls would break your query, then not in () is a bad
> choice.  Therefore, effort to optimize had been placed into except,
> which is distinctly, symantically different from not in ().
>
> It seems like some shift in the pg community has happened where we're
> suddenly getting a lot of folks who came from a database where not in
> and except are treated the same, even though they most definitely do
> not mean the same thing.
>


[PERFORM] NOT IN >2hrs vs EXCEPT < 2 sec.

2009-01-28 Thread Kevin Traster
2 questions:

1) Different costs for same actions. Doing an explain on 2 nearly identical
queries both involving the same Index scan on same table has 2 widely
different costs for same Index scan  303375872.86 vs. 12576.70

2) Simple query using NOT IN (subquery)was killed after 2 hrs, using the
same query (query) except (query) ran in < 2 sec.

Summary:

On devel box (Unix PG version 8.3.5) with no other database activity or
system activity after immediately completing a vacuum analyze.

The original query (below) was running for over 2 hrs and was killed.:

select distinct ciknum into tmpnocikinowner from cik where ciknum not in
(select cik from owner_cik_master);

est total cost: 303375872.86, for Index Scan: 303375616.75

Simple query broken down: explain select distinct ciknum into
tmpnocikinowner from cik ;

est total cost: 12576.70, for Index Scan: 12064.49

and

select cik from owner_cik_master

est total cost: 2587.36, for Index Scan: N/A

Actual time, the query was killed after 2hrs,

However, we ran:

select distinct ciknum into tmpnocikinowner from cik ; - actual time 861.487
ms
 (select ciknum from tmpnocikinowner)  except (select cik from
owner_cik_master);  - actual time 1328.094 ms



# Console log below with details ##

devel=# explain select distinct ciknum into tmpnocikinowner from cik where
ciknum not in (select cik from owner_cik_master);
   QUERY
PLAN

 Unique  (cost=3506.21..303375872.86 rows=71946 width=8)
   ->  Index Scan using cik_ciknum_idx on cik  (cost=3506.21..303375616.75
rows=102444 width=8)
 Filter: (NOT (subplan))
 SubPlan
   ->  Materialize  (cost=3506.21..6002.40 rows=186019 width=4)
 ->  Seq Scan on owner_cik_master  (cost=0.00..2684.19
rows=186019 width=4)
(6 rows)

Time: 0.723 ms
devel=# explain select ciknum into tmpnocikinowner from cik where ciknum not
in (select cik from owner_cik_master);
  QUERY
PLAN
--
 Seq Scan on cik  (cost=3506.21..303367660.13 rows=102444 width=8)
   Filter: (NOT (subplan))
   SubPlan
 ->  Materialize  (cost=3506.21..6002.40 rows=186019 width=4)
   ->  Seq Scan on owner_cik_master  (cost=0.00..2684.19 rows=186019
width=4)
(5 rows)

Time: 0.588 ms
devel=# explain select ciknum::int into tmpnocikinowner from cik where
ciknum::int not in (select cik::int from owner_cik_master);
  QUERY
PLAN
--
 Seq Scan on cik  (cost=3506.21..303368428.46 rows=102444 width=8)
   Filter: (NOT (subplan))
   SubPlan
 ->  Materialize  (cost=3506.21..6002.40 rows=186019 width=4)
   ->  Seq Scan on owner_cik_master  (cost=0.00..2684.19 rows=186019
width=4)
(5 rows)

Time: 0.918 ms
devel=# explain select ciknum into tmpnocikinowner from cik
;
QUERY PLAN
---
 Seq Scan on cik  (cost=0.00..4107.87 rows=204887 width=8)
(1 row)

Time: 0.438 ms
devel=# explain select distinct ciknum into tmpnocikinowner from cik ;
   QUERY
PLAN
-
 Unique  (cost=0.00..12576.70 rows=143891 width=8)
   ->  Index Scan using cik_ciknum_idx on cik  (cost=0.00..12064.49
rows=204887 width=8)
(2 rows)

Time: 0.468 ms
devel=#  select distinct ciknum into tmpnocikinowner from cik ;
SELECT
Time: 861.487 ms

devel=# explain select ciknum from tmpnocikinowner  where ciknum not in
(select cik from owner_cik_master);
  QUERY
PLAN
--
 Seq Scan on tmpnocikinowner  (cost=3506.21..261092922.31 rows=88168
width=8)
   Filter: (NOT (subplan))
   SubPlan
 ->  Materialize  (cost=3506.21..6002.40 rows=186019 width=4)
   ->  Seq Scan on owner_cik_master  (cost=0.00..2684.19 rows=186019
width=4)
(5 rows)

Time: 0.629 ms

devel=# explain select cik from owner_cik_master;
   QUERY PLAN

 Seq Scan on owner_cik_master  (cost=0.00..2684.19 rows=186019 width=4)
(1 row)

Time: 0.415 ms
devel=# explain select ciknum from tmpnocikinowner;;
  QUERY PLAN
---
 Seq Scan on tmpnocikinowner  (cost=0.00..2587.36 rows=176336 width=8)
(1 row)

Time: 0.413 ms
devel=# explain (select ciknum from tmpnocikinowner)  except (select cik
from owner_cik_master);
   QUERY
PLAN
---