subject:"\[PERFORM\] bitmask index"

Re: [PERFORM] bitmask index

2011-07-05 Thread Marcus Engene


On 6/22/11 11:42 , Greg Smith wrote:

On 06/22/2011 05:27 PM, Marcus Engene wrote:
I have some tables with bitmask integers. Set bits are the 
interesting ones. Usually they are sparse.


If it's sparse, create a partial index that just includes rows where 
the bit is set:  
http://www.postgresql.org/docs/current/static/indexes-partial.html


You need to be careful the query uses the exact syntax as the one that 
created the index for it to be used.  But if you do that, it should be 
able to pull the rows that match out quickly.



I ended up having a separate table with an index on.

Though partial index solved another problem. Usually I'm a little bit 
annoyed with the optimizer and the developers religious fix the planner 
instead of index hints. I must say that I'm willing to reconsider my 
usual stance to that.


We have a large table of products where status=20 is a rare intermediate 
status. I added a...


CREATE INDEX pond_item_common_x8 ON pond_item_common(pond_user, status)
WHERE status = 20;

...and a slow 5s select with users who had existing status=20 items 
became very fast. Planner, I guess, saw the 1 status 20 clips (out 
of millions of items) instead of like 5 different values of status and 
thus ignoring the index. Super!


To my great amazement, the planner also managed to use the index when 
counting how many status=20 items there are in total:


pond90= explain analyze select
pond90- coalesce(sum(tt.antal),0) as nbr_in_queue
pond90- from
pond90- (
pond90( select
pond90( pu.username
pond90(,t.antal
pond90( from
pond90( (
pond90( select
pond90( sum(1) as antal
pond90(,pond_user
pond90( from
pond90( pond_item_common
pond90( where
pond90( status = 20
pond90( group by pond_user
pond90( ) as t
pond90(,pond_user pu
pond90( where
pond90( pu.objectid = t.pond_user
pond90( order by t.antal desc
pond90( ) as tt;
  QUERY 
PLAN

--
 Aggregate  (cost=38079.45..38079.46 rows=1 width=8) (actual 
time=166.439..166.440 rows=1 loops=1)
   -  Sort  (cost=38079.13..38079.18 rows=21 width=18) (actual 
time=166.009..166.085 rows=648 loops=1)

 Sort Key: (sum(1))
 Sort Method:  quicksort  Memory: 67kB
 -  Nested Loop  (cost=37903.66..38078.67 rows=21 width=18) 
(actual time=157.545..165.561 rows=648 loops=1)
   -  HashAggregate  (cost=37903.66..37903.92 rows=21 
width=4) (actual time=157.493..157.720 rows=648 loops=1)
 -  Bitmap Heap Scan on pond_item_common  
(cost=451.43..37853.37 rows=10057 width=4) (actual time=9.061..151.511 
rows=12352 loops=1)

   Recheck Cond: (status = 20)
   -  Bitmap Index Scan on 
pond_item_common_x8  (cost=0.00..448.91 rows=10057 width=0) (actual 
time=5.654..5.654 rows=20051 loops=1)

 Index Cond: (status = 20)
   -  Index Scan using pond_user_pkey on pond_user pu  
(cost=0.00..8.30 rows=1 width=14) (actual time=0.011..0.012 rows=1 
loops=648)

 Index Cond: (pu.objectid = pond_item_common.pond_user)
 Total runtime: 166.709 ms
(13 rows)

My hat's off to the dev gang. Impressive!

Best,
Marcus


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] bitmask index

2011-07-05 Thread Greg Smith


On 07/05/2011 06:15 AM, Marcus Engene wrote:
Though partial index solved another problem. Usually I'm a little bit 
annoyed with the optimizer and the developers religious fix the 
planner instead of index hints. I must say that I'm willing to 
reconsider my usual stance to that.


We have a large table of products where status=20 is a rare 
intermediate status. I added a...


CREATE INDEX pond_item_common_x8 ON pond_item_common(pond_user, status)
WHERE status = 20;

...and a slow 5s select with users who had existing status=20 items 
became very fast. Planner, I guess, saw the 1 status 20 clips (out 
of millions of items) instead of like 5 different values of status and 
thus ignoring the index. Super!


To my great amazement, the planner also managed to use the index when 
counting how many status=20 items there are in total:


I'm glad we got you to make a jump toward common ground with the 
database's intended use.  There are many neat advanced ways to solve the 
sorts of problems people try to hammer with hints available in 
PostgreSQL, some of which don't even exist in other databases.  It's 
kind of interesting to me how similarly one transition tends to happen 
to people who learn a lot about those options, enough that they can talk 
fully informed about things like how hints would have to work in 
PostgreSQL--for example:  they'd have to consider all all these partial 
index possibilities.  Once you go through all that, suddenly a lot of 
the people who do it realize that maybe hints aren't as important as 
good design and indexing--when you take advantages of all the features 
available to you--after all.


To help explain what happened to you here a little better, the planner 
tracks Most Common Values in the database, and it uses those statistics 
to make good decisions about the ones it finds.  But when a value is 
really rare, it's never going to make it to that list, and therefore the 
planner is going to make a guess about how likely it is--likely a wrong 
one.  By creating a partial index on that item, it's essentially adding 
that information--just how many rows are going to match a query looking 
for that value--so that it can be utilized the same way MCVs are.  
Adding partial indexes on sparse columns that are critical to a common 
report allow what I'm going to coin a new acronym for:  those are part 
of the Most Important Values in that column.  The MIV set is the MCV 
information plus information about the rare but critical columns.  And 
the easiest way to expose that data to the planner is with a partial index.


I smell a blog post coming on this topic.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
Comprehensive and Customized PostgreSQL Training Classes:
http://www.2ndquadrant.us/postgresql-training/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] bitmask index

2011-06-23 Thread Robert Klemme


On 06/22/2011 11:42 PM, Greg Smith wrote:

On 06/22/2011 05:27 PM, Marcus Engene wrote:

I have some tables with bitmask integers. Set bits are the interesting
ones. Usually they are sparse.


If it's sparse, create a partial index that just includes rows where the
bit is set:
http://www.postgresql.org/docs/current/static/indexes-partial.html


That would mean that if different bits are queried there would need to 
be several of those indexes.


Maybe it's an alternative to index all rows where misc_bits  0 and 
include that criterion in the query.


Kind regards

robert


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] bitmask index

2011-06-22 Thread Marcus Engene


Hi list,

I use Postgres 9.0.4.

I have some tables with bitmask integers. Set bits are the interesting 
ones. Usually they are sparse.


-- Many rows  columns
CREATE TABLE a_table
(
 objectid   INTEGER PRIMARY KEY NOT NULL
,misc_bits  INTEGER DEFAULT 0 NOT NULL
...
)
WITHOUT OIDS;

...and when I use it I...

select
...
from
a_table
where
0  (misc_bits  (1  13))

Now the dear tables have swollen and these scans aren't as nice anymore.

What indexing strategies would you use here?

External table?:

create table a_table_feature_x
(
 objectid   INTEGER PRIMARY KEY NOT NULL -- fk to 
a_table.objectid

)
WITHOUT OIDS;


Internal in the big mama table?:

CREATE TABLE a_table
(
 objectid   INTEGER PRIMARY KEY NOT NULL
,misc_bits  INTEGER DEFAULT 0 NOT NULL
,feature_x  VARCHAR(1) -- 'y' or null
...
)
WITHOUT OIDS;

CREATE INDEX a_table_x1 ON a_table(feature_x); -- I assume nulls are not 
here



Some other trick?


Thanks,
Marcus

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] bitmask index

2011-06-22 Thread Greg Smith


On 06/22/2011 05:27 PM, Marcus Engene wrote:
I have some tables with bitmask integers. Set bits are the interesting 
ones. Usually they are sparse.


If it's sparse, create a partial index that just includes rows where the 
bit is set:  
http://www.postgresql.org/docs/current/static/indexes-partial.html


You need to be careful the query uses the exact syntax as the one that 
created the index for it to be used.  But if you do that, it should be 
able to pull the rows that match out quickly.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] bitmask index

Re: [PERFORM] bitmask index

Re: [PERFORM] bitmask index

[PERFORM] bitmask index

Re: [PERFORM] bitmask index

5 matches

Site Navigation

Mail list logo

Footer information