Re: [PERFORM] Planner question - bit data types

2009-09-05 Thread Merlin Moncure
On Fri, Sep 4, 2009 at 6:29 PM, Josh Berkusj...@agliodbs.com wrote:
 Karl,

 For reference, I was having SEVERE performance problems with the
 following comparison in an SQL statement where mask was an integer:

 select ... from  where .. and (permission  mask = permission)

 AFAIK, the only way to use an index on these queries is through
 expression indexes.  That's why a lot of folks use INTARRAY instead; it
 comes with a GIN index type.

 It would probably be possible to create a new index type using GiST or
 GIN which indexed bitstrings automatically, but I don't know that anyone
 has done it yet.

 Changing your integer to a bitstring will not, to my knowledge, improve
 this.

agreed.   also, gist/gin is no free lunch, maintaining these type of
indexes is fairly expensive.

If you are only interested in one or a very small number of cases of
'permission', you can use an expression index to target constant
values:

select ... from  where .. and (permission  mask = permission)

create index foo_permission_xyz_idx on foo((64  mask = 64));
select * from foo where 64  mask = 64; --indexed!

this optimizes a _particular_ case of permission into a boolean based
index.  this can be a big win if the # of matching cases is very small
or you want to use this in a multi-column index.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner question - bit data types

2009-09-05 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 If you are only interested in one or a very small number of cases of
 'permission', you can use an expression index to target constant
 values:

 select ... from  where .. and (permission  mask = permission)

 create index foo_permission_xyz_idx on foo((64  mask = 64));
 select * from foo where 64  mask = 64; --indexed!

A possibly more useful variant is to treat the permission condition
as a partial index's WHERE condition.  The advantage of that is that
the index's actual content can be some other column, so that you can
combine the permission check with a second indexable test.  The index
is still available for queries that don't use the other column, but
it's more useful for those that do.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner question - bit data types

2009-09-05 Thread Karl Denninger
Tom Lane wrote:
 Merlin Moncure mmonc...@gmail.com writes:
   
 If you are only interested in one or a very small number of cases of
 'permission', you can use an expression index to target constant
 values:
 

   
 select ... from  where .. and (permission  mask = permission)
 

   
 create index foo_permission_xyz_idx on foo((64  mask = 64));
 select * from foo where 64  mask = 64; --indexed!
 

 A possibly more useful variant is to treat the permission condition
 as a partial index's WHERE condition.  The advantage of that is that
 the index's actual content can be some other column, so that you can
 combine the permission check with a second indexable test.  The index
 is still available for queries that don't use the other column, but
 it's more useful for those that do.

   regards, tom lane

   
That doesn't help in this case as the returned set will typically be
quite large, with the condition typically being valid on anywhere from
10-80% of the returned tuples.

What I am trying to avoid is creating a boolean column for EACH
potential bit (and an index on each), as that makes the schema
non-portable for others and quite messy as well - while there are a
handful of known masks the system also has a number of user defined
bit positions that vary from installation to installation.


-- Karl
attachment: karl.vcf
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner question - bit data types

2009-09-05 Thread Tom Lane
Karl Denninger k...@denninger.net writes:
 That doesn't help in this case as the returned set will typically be
 quite large, with the condition typically being valid on anywhere from
 10-80% of the returned tuples.

In that case you'd be wasting your time to get it to use an index
for the condition anyway.  Maybe you need to take a step back and
look at the query as a whole rather than focus on this particular
condition.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner question - bit data types

2009-09-05 Thread Karl Denninger
Tom Lane wrote:
 Karl Denninger k...@denninger.net writes:
   
 That doesn't help in this case as the returned set will typically be
 quite large, with the condition typically being valid on anywhere from
 10-80% of the returned tuples.
 

 In that case you'd be wasting your time to get it to use an index
 for the condition anyway.  Maybe you need to take a step back and
 look at the query as a whole rather than focus on this particular
 condition.

   regards, tom lane

   
The query, sans this condition, is extremely fast and contains a LOT of
other conditions (none of which cause trouble.)

It is only attempting to filter the returned tuples on the permission
bit(s) involved that cause trouble.

-- Karl
attachment: karl.vcf
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner question - bit data types

2009-09-05 Thread Tom Lane
Karl Denninger k...@denninger.net writes:
 Tom Lane wrote:
 In that case you'd be wasting your time to get it to use an index
 for the condition anyway.  Maybe you need to take a step back and
 look at the query as a whole rather than focus on this particular
 condition.

 The query, sans this condition, is extremely fast and contains a LOT of
 other conditions (none of which cause trouble.)
 It is only attempting to filter the returned tuples on the permission
 bit(s) involved that cause trouble.

My comment stands: asking about how to use an index for this is the
wrong question.

You never showed us any EXPLAIN results, but I suspect what is happening
is that the planner thinks the permission  mask = permission
condition is fairly selective (offhand I think it'd default to
DEFAULT_EQ_SEL or 0.005) whereas the true selectivity per your prior
comment is only 0.1 to 0.8.  This is causing it to change to a plan that
would be good for a small number of rows, when it should stick to a plan
that is good for a large number of rows.

So the right question is how do I fix the bad selectivity estimate?.
Unfortunately there's no non-kluge answer.  What I think I'd try is
wrapping the condition into a function, say

create function permission_match(perms int, mask int) returns bool
as $$begin return perms  mask = mask; end$$ language plpgsql
strict immutable;

The planner won't know what to make of where permission_match(perms, 64)
either, but the default selectivity estimate for a boolean function
is 0.333, much closer to what you need.

Or plan B, which I'd recommend, is to forget the mask business and go
over to a boolean column per permission flag.  Then the planner would
actually have decent statistics about the flag selectivities, and the
queries would be a lot more readable too.  Your objection that you'd
need an index per flag column is misguided --- at these selectivities
an index is really pointless.  And I entirely fail to understand the
complaint about it being unportable; you think  is more portable than
boolean?  Only one of those things is in the SQL standard.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner question - bit data types

2009-09-05 Thread Karl Denninger
Tom Lane wrote:
 Karl Denninger k...@denninger.net writes:
   
 Tom Lane wrote:
 
 In that case you'd be wasting your time to get it to use an index
 for the condition anyway.  Maybe you need to take a step back and
 look at the query as a whole rather than focus on this particular
 condition.
   
 The query, sans this condition, is extremely fast and contains a LOT of
 other conditions (none of which cause trouble.)
 It is only attempting to filter the returned tuples on the permission
 bit(s) involved that cause trouble.
 

 My comment stands: asking about how to use an index for this is the
 wrong question.

 You never showed us any EXPLAIN results,
Yes I did.  Go back and look at the archives.  I provided full EXPLAIN
and EXPLAIN ANALYZE results for the original query.  Sheesh.
 Or plan B, which I'd recommend, is to forget the mask business and go
 over to a boolean column per permission flag.  Then the planner would
 actually have decent statistics about the flag selectivities, and the
 queries would be a lot more readable too.  Your objection that you'd
 need an index per flag column is misguided --- at these selectivities
 an index is really pointless.  And I entirely fail to understand the
 complaint about it being unportable; you think  is more portable than
 boolean?  Only one of those things is in the SQL standard.

   regards, tom lane
   
The point isn't portability to other SQL engines - it is to other
people's installations.  The bitmask is (since it requires only changing
the mask constants in the container file that makes the SQL calls by
reference) where explicit columns is not by a long shot.

In any event it looks like that's the only reasonable way to do this, so
thanks (I think)

-- Karl
attachment: karl.vcf
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner question - bit data types

2009-09-05 Thread Alvaro Herrera
Karl Denninger escribió:
 Tom Lane wrote:

  You never showed us any EXPLAIN results,
 Yes I did.  Go back and look at the archives.  I provided full EXPLAIN
 and EXPLAIN ANALYZE results for the original query.  Sheesh.

You did?  Where?  This is your first message in this thread:
http://archives.postgresql.org/pgsql-performance/2009-09/msg00059.php
No EXPLAINs anywhere to be seen.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner question - bit data types

2009-09-05 Thread Karl Denninger
There was a previous thread and I referenced it. I don't have the other
one in my email system any more to follow up to it.

I give up; the attack-dog crowd has successfully driven me off.  Ciao.

Alvaro Herrera wrote:
 Karl Denninger escribió:
   
 Tom Lane wrote:
 

   
 You never showed us any EXPLAIN results,
   
 Yes I did.  Go back and look at the archives.  I provided full EXPLAIN
 and EXPLAIN ANALYZE results for the original query.  Sheesh.
 

 You did?  Where?  This is your first message in this thread:
 http://archives.postgresql.org/pgsql-performance/2009-09/msg00059.php
 No EXPLAINs anywhere to be seen.

   
attachment: karl.vcf
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner question - bit data types

2009-09-05 Thread Merlin Moncure
On Sat, Sep 5, 2009 at 8:19 PM, Karl Denningerk...@denninger.net wrote:
 There was a previous thread and I referenced it. I don't have the other one
 in my email system any more to follow up to it.

 I give up; the attack-dog crowd has successfully driven me off.  Ciao.

Another more standard sql approach is to push the flags out to a
subordinate table.  This is less efficient of course but now you get
to use standard join tactics to match conditions...


merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance