Re: [HACKERS] Questions about indexes with text_pattern_ops
Kaare Rasmussen [EMAIL PROTECTED] writes: Hi The database is initialized with utf8, so in order for LIKE to use the index on a text field, I used text_pattern_ops when I created it. So far so good. It's in the documentation, but there's no explanation of why this index will only work for LIKE searches. How come that I have to have two different indexes if I want to give Postgres the ability to choose index scan over seq scan on LIKE and non-LIKE searches? Because in non-C locales (which you're almost certainly using if you're using UTF8) the ordering which the normal text operations use can be quite complex. Just as an example most locales have spaces being entirely insignificant. So no range can reliably match a prefix LIKE pattern. The text_pattern_ops use simple character-by-character ordering which are useful for LIKE but not for regular and comparisons. They're just two different orderings. Also, when I tried to create the index as a partial one (avoiding the 95% entries with empty strings), Postgresql chooses to use seq scan. This sounds counter intuitive to me. CREATE INDEX new_index ON a (b text_pattern_ops) WHERE b ''; This is 8.2.6. Hm, for a simple = or I think it doesn't matter which operator class you use. For or it would produce different answers. Postgres isn't clever enough to notice that this is equivalent though so I think you would have to do something like (untested): CREATE INDEX new_index ON a (b text_pattern_ops) WHERE b ~~ ''; That uses the same operator that the LIKE clause will use for the index range. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Questions about indexes with text_pattern_ops
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Hm, for a simple = or I think it doesn't matter which operator class you use. For or it would produce different answers. Postgres isn't clever enough to notice that this is equivalent though so I think you would have to do something like (untested): CREATE INDEX new_index ON a (b text_pattern_ops) WHERE b ~~ ''; That uses the same operator that the LIKE clause will use for the index range. I'm intending to get rid of ~=~ and ~~ for 8.4; there's no longer any reason why those slots in the pattern_ops classes can't be filled by the plain = and operators. (There *was* a reason when they were first invented --- but now that texteq will only return true for exact bitwise match, I think it's OK to assume these are equivalent.) The only question is whether we'll keep that forever. I thought it was a good idea at the time but I'm starting to wonder about the implications for multi-key indexes. In the meantime, though, I think the only way that Kaare's query can use that index is if he writes WHERE b LIKE 'whatever' AND b ''; (with whatever spelling of the index predicate has). There is not anything in the predicate proving machinery that knows enough about LIKE to be able to show that b LIKE 'whatever' implies b ''. I was thinking that the inequalities that the LIKE index scan introduces would imply the inequality. I take it we generate those inequalities too late in the planning process to use them for other planning? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Questions about indexes with text_pattern_ops
Gregory Stark [EMAIL PROTECTED] writes: Hm, for a simple = or I think it doesn't matter which operator class you use. For or it would produce different answers. Postgres isn't clever enough to notice that this is equivalent though so I think you would have to do something like (untested): CREATE INDEX new_index ON a (b text_pattern_ops) WHERE b ~~ ''; That uses the same operator that the LIKE clause will use for the index range. I'm intending to get rid of ~=~ and ~~ for 8.4; there's no longer any reason why those slots in the pattern_ops classes can't be filled by the plain = and operators. (There *was* a reason when they were first invented --- but now that texteq will only return true for exact bitwise match, I think it's OK to assume these are equivalent.) In the meantime, though, I think the only way that Kaare's query can use that index is if he writes WHERE b LIKE 'whatever' AND b ''; (with whatever spelling of the index predicate has). There is not anything in the predicate proving machinery that knows enough about LIKE to be able to show that b LIKE 'whatever' implies b ''. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Questions about indexes with text_pattern_ops
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: I'm intending to get rid of ~=~ and ~~ for 8.4; there's no longer any reason why those slots in the pattern_ops classes can't be filled by the plain = and operators. (There *was* a reason when they were first invented --- but now that texteq will only return true for exact bitwise match, I think it's OK to assume these are equivalent.) The only question is whether we'll keep that forever. I thought it was a good idea at the time but I'm starting to wonder about the implications for multi-key indexes. How so? If you think this change is a bad idea you'd better speak up PDQ. In the meantime, though, I think the only way that Kaare's query can use that index is if he writes WHERE b LIKE 'whatever' AND b ''; (with whatever spelling of the index predicate has). There is not anything in the predicate proving machinery that knows enough about LIKE to be able to show that b LIKE 'whatever' implies b ''. I was thinking that the inequalities that the LIKE index scan introduces would imply the inequality. I take it we generate those inequalities too late in the planning process to use them for other planning? Hmm, good point [ experiments... ] Yeah, it seems we don't reconsider partial indexes after those clauses have been generated. Not sure how expensive it'd be to change that. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Questions about indexes with text_pattern_ops
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: I'm intending to get rid of ~=~ and ~~ for 8.4; there's no longer any reason why those slots in the pattern_ops classes can't be filled by the plain = and operators. (There *was* a reason when they were first invented --- but now that texteq will only return true for exact bitwise match, I think it's OK to assume these are equivalent.) The only question is whether we'll keep that forever. I thought it was a good idea at the time but I'm starting to wonder about the implications for multi-key indexes. How so? If you think this change is a bad idea you'd better speak up PDQ. Well I think it's fine for 'foo ' != 'foo' even if they sort similarly. But I'm not sure it makes sense for 'foo ','a' to sort after 'foo','b' if the locale says that 'foo ' should be compare equal to 'foo' and 'a' before 'b'. I'm starting to think equality and sorts interchangeably are not the same operator at all. On the other hand it may not be worth the complexity that might bring. I was thinking that the inequalities that the LIKE index scan introduces would imply the inequality. I take it we generate those inequalities too late in the planning process to use them for other planning? Hmm, good point [ experiments... ] Yeah, it seems we don't reconsider partial indexes after those clauses have been generated. Not sure how expensive it'd be to change that. Perhaps we should always generate those inequalities even if there's no index that can use them. Then calculate the regexp selectivity based only on the regexp after the prefix. That might also help constraint exclusion. Maybe merge joins? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Questions about indexes with text_pattern_ops
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: How so? If you think this change is a bad idea you'd better speak up PDQ. Well I think it's fine for 'foo ' != 'foo' even if they sort similarly. But I'm not sure it makes sense for 'foo ','a' to sort after 'foo','b' if the locale says that 'foo ' should be compare equal to 'foo' and 'a' before 'b'. I don't think we can concern ourselves with that; it would require allowing different columns of an index to interact, which would be impossibly messy. What's more, it'd destroy the property that a btree index is sorted by its leading column(s) as well as by all its columns. Perhaps we should always generate those inequalities even if there's no index that can use them. Hmmm ... we intentionally don't do that, but the constraint exclusion code might be a sufficient reason to reconsider. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Questions about indexes with text_pattern_ops
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: How so? If you think this change is a bad idea you'd better speak up PDQ. Well I think it's fine for 'foo ' != 'foo' even if they sort similarly. But I'm not sure it makes sense for 'foo ','a' to sort after 'foo','b' if the locale says that 'foo ' should be compare equal to 'foo' and 'a' before 'b'. I don't think we can concern ourselves with that; it would require allowing different columns of an index to interact, which would be impossibly messy. What's more, it'd destroy the property that a btree index is sorted by its leading column(s) as well as by all its columns. Well, I was thinking we might have to separate the equal operators from the btree opclass. Equals would be a stricter property than sorts as same. It would be mighty strange to have values which compare unequal but are neither or though. Or which compare equal but also compare or . It might be a little less surprising if we invent a new operator === for actually the same and have == report whether two objects sort as equals. But I'm not sure our experience with Turkish doesn't show that that will still surprise people. It may be more right in an abstract ideal world -- the reality is that text collation is annoyingly complex. But this may be a case where we can get away with just eliding this hassle. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Questions about indexes with text_pattern_ops
Gregory Stark [EMAIL PROTECTED] writes: It may be more right in an abstract ideal world -- the reality is that text collation is annoyingly complex. But this may be a case where we can get away with just eliding this hassle. If anyone actually complains about it, I think we can point to the SQL spec, which unambiguously says that a multicolumn sort key is considered one column at a time. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster