Re: [HACKERS] Questions about indexes with text_pattern_ops

2008-02-25 Thread Gregory Stark
Kaare Rasmussen [EMAIL PROTECTED] writes:

 Hi 

 The database is initialized with utf8, so in order for LIKE to use the index 
 on
 a text field, I used text_pattern_ops when I created it. So far so good. 

 It's in the documentation, but there's no explanation of why this index will
 only work for LIKE searches. How come that I have to have two different 
 indexes
 if I want to give Postgres the ability to choose index scan over seq scan on
 LIKE and non-LIKE searches? 

Because in non-C locales (which you're almost certainly using if you're using
UTF8) the ordering which the normal text operations use can be quite complex.
Just as an example most locales have spaces being entirely insignificant. So
no range can reliably match a prefix LIKE pattern. The text_pattern_ops use
simple character-by-character ordering which are useful for LIKE but not for
regular  and  comparisons. They're just two different orderings.

 Also, when I tried to create the index as a partial one (avoiding the 95%
 entries with empty strings), Postgresql chooses to use seq scan. This sounds
 counter intuitive to me. 

 CREATE INDEX new_index ON a (b text_pattern_ops) WHERE b  '';
 This is 8.2.6.

Hm, for a simple = or  I think it doesn't matter which operator class you
use. For  or  it would produce different answers. Postgres isn't clever enough
to notice that this is equivalent though so I think you would have to do
something like (untested):

CREATE INDEX new_index ON a (b text_pattern_ops) WHERE b ~~ '';

That uses the same operator that the LIKE clause will use for the index range.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Questions about indexes with text_pattern_ops

2008-02-25 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Hm, for a simple = or  I think it doesn't matter which operator class you
 use. For  or  it would produce different answers. Postgres isn't clever 
 enough
 to notice that this is equivalent though so I think you would have to do
 something like (untested):

 CREATE INDEX new_index ON a (b text_pattern_ops) WHERE b ~~ '';

 That uses the same operator that the LIKE clause will use for the index 
 range.

 I'm intending to get rid of ~=~ and ~~ for 8.4; there's no longer any
 reason why those slots in the pattern_ops classes can't be filled by the
 plain = and  operators.  (There *was* a reason when they were first
 invented --- but now that texteq will only return true for exact bitwise
 match, I think it's OK to assume these are equivalent.)

The only question is whether we'll keep that forever. I thought it was a good
idea at the time but I'm starting to wonder about the implications for
multi-key indexes.

 In the meantime, though, I think the only way that Kaare's query can use
 that index is if he writes
   WHERE b LIKE 'whatever' AND b  '';
 (with whatever spelling of  the index predicate has).  There is not
 anything in the predicate proving machinery that knows enough about LIKE
 to be able to show that b LIKE 'whatever' implies b  ''.

I was thinking that the inequalities that the LIKE index scan introduces would
imply the inequality. I take it we generate those inequalities too late in the
planning process to use them for other planning? 


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Questions about indexes with text_pattern_ops

2008-02-25 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Hm, for a simple = or  I think it doesn't matter which operator class you
 use. For  or  it would produce different answers. Postgres isn't clever 
 enough
 to notice that this is equivalent though so I think you would have to do
 something like (untested):

 CREATE INDEX new_index ON a (b text_pattern_ops) WHERE b ~~ '';

 That uses the same operator that the LIKE clause will use for the index range.

I'm intending to get rid of ~=~ and ~~ for 8.4; there's no longer any
reason why those slots in the pattern_ops classes can't be filled by the
plain = and  operators.  (There *was* a reason when they were first
invented --- but now that texteq will only return true for exact bitwise
match, I think it's OK to assume these are equivalent.)

In the meantime, though, I think the only way that Kaare's query can use
that index is if he writes
WHERE b LIKE 'whatever' AND b  '';
(with whatever spelling of  the index predicate has).  There is not
anything in the predicate proving machinery that knows enough about LIKE
to be able to show that b LIKE 'whatever' implies b  ''.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Questions about indexes with text_pattern_ops

2008-02-25 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 I'm intending to get rid of ~=~ and ~~ for 8.4; there's no longer any
 reason why those slots in the pattern_ops classes can't be filled by the
 plain = and  operators.  (There *was* a reason when they were first
 invented --- but now that texteq will only return true for exact bitwise
 match, I think it's OK to assume these are equivalent.)

 The only question is whether we'll keep that forever. I thought it was a good
 idea at the time but I'm starting to wonder about the implications for
 multi-key indexes.

How so?  If you think this change is a bad idea you'd better speak up
PDQ.

 In the meantime, though, I think the only way that Kaare's query can use
 that index is if he writes
 WHERE b LIKE 'whatever' AND b  '';
 (with whatever spelling of  the index predicate has).  There is not
 anything in the predicate proving machinery that knows enough about LIKE
 to be able to show that b LIKE 'whatever' implies b  ''.

 I was thinking that the inequalities that the LIKE index scan introduces would
 imply the inequality. I take it we generate those inequalities too late in the
 planning process to use them for other planning? 

Hmm, good point [ experiments... ]  Yeah, it seems we don't reconsider
partial indexes after those clauses have been generated.  Not sure how
expensive it'd be to change that.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Questions about indexes with text_pattern_ops

2008-02-25 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 I'm intending to get rid of ~=~ and ~~ for 8.4; there's no longer any
 reason why those slots in the pattern_ops classes can't be filled by the
 plain = and  operators.  (There *was* a reason when they were first
 invented --- but now that texteq will only return true for exact bitwise
 match, I think it's OK to assume these are equivalent.)

 The only question is whether we'll keep that forever. I thought it was a good
 idea at the time but I'm starting to wonder about the implications for
 multi-key indexes.

 How so?  If you think this change is a bad idea you'd better speak up
 PDQ.

Well I think it's fine for 'foo ' != 'foo' even if they sort similarly. 

But I'm not sure it makes sense for 'foo ','a' to sort after 'foo','b' if
the locale says that 'foo ' should be compare equal to 'foo' and 'a' before
'b'.

I'm starting to think equality and sorts interchangeably are not the same
operator at all. On the other hand it may not be worth the complexity that
might bring.

 I was thinking that the inequalities that the LIKE index scan introduces 
 would
 imply the inequality. I take it we generate those inequalities too late in 
 the
 planning process to use them for other planning? 

 Hmm, good point [ experiments... ]  Yeah, it seems we don't reconsider
 partial indexes after those clauses have been generated.  Not sure how
 expensive it'd be to change that.

Perhaps we should always generate those inequalities even if there's no index
that can use them. Then calculate the regexp selectivity based only on the
regexp after the prefix.

That might also help constraint exclusion. Maybe merge joins?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Questions about indexes with text_pattern_ops

2008-02-25 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 How so?  If you think this change is a bad idea you'd better speak up
 PDQ.

 Well I think it's fine for 'foo ' != 'foo' even if they sort similarly. 

 But I'm not sure it makes sense for 'foo ','a' to sort after 'foo','b' if
 the locale says that 'foo ' should be compare equal to 'foo' and 'a' before
 'b'.

I don't think we can concern ourselves with that; it would require
allowing different columns of an index to interact, which would be
impossibly messy.  What's more, it'd destroy the property that a btree
index is sorted by its leading column(s) as well as by all its columns.

 Perhaps we should always generate those inequalities even if there's no index
 that can use them.

Hmmm ... we intentionally don't do that, but the constraint exclusion
code might be a sufficient reason to reconsider.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Questions about indexes with text_pattern_ops

2008-02-25 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 How so?  If you think this change is a bad idea you'd better speak up
 PDQ.

 Well I think it's fine for 'foo ' != 'foo' even if they sort similarly. 

 But I'm not sure it makes sense for 'foo ','a' to sort after 'foo','b' if
 the locale says that 'foo ' should be compare equal to 'foo' and 'a' before
 'b'.

 I don't think we can concern ourselves with that; it would require
 allowing different columns of an index to interact, which would be
 impossibly messy.  What's more, it'd destroy the property that a btree
 index is sorted by its leading column(s) as well as by all its columns.

Well, I was thinking we might have to separate the equal operators from the
btree opclass. Equals would be a stricter property than sorts as same.

It would be mighty strange to have values which compare unequal but are
neither  or  though. Or which compare equal but also compare  or .

It might be a little less surprising if we invent a new operator === for
actually the same and have == report whether two objects sort as equals. But
I'm not sure our experience with Turkish doesn't show that that will still
surprise people.

It may be more right in an abstract ideal world -- the reality is that text
collation is annoyingly complex. But this may be a case where we can get away
with just eliding this hassle.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Questions about indexes with text_pattern_ops

2008-02-25 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 It may be more right in an abstract ideal world -- the reality is that text
 collation is annoyingly complex. But this may be a case where we can get away
 with just eliding this hassle.

If anyone actually complains about it, I think we can point to the SQL
spec, which unambiguously says that a multicolumn sort key is considered
one column at a time.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster