Re: [GENERAL] text column indexing in UTF-8 database

2009-03-13 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes: It looks like an index using text_pattern_ops can be used for equality (see my test case below). This is true as of 8.4; prior versions make a distinction between = and ~=~. This works apparently because texteq() is defined as bitwise-equality. Is that

Re: [GENERAL] text column indexing in UTF-8 database

2009-03-13 Thread Jeff Davis
On Fri, 2009-03-13 at 13:13 -0400, Tom Lane wrote: There is actually some history here; the former distinction in the equality operators arose from exactly your concern. But after we put in the second-pass check to insist on bitwise equality, we realized that the equality operators really

Re: [GENERAL] text column indexing in UTF-8 database

2009-03-13 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes: One thing that still doesn't make sense to me is that texteq() is bitwise-equality even in 8.3. Historical artifact ... we made the semantics change some time ago, but the ensuing change to remove ~=~ didn't happen until 8.4. It sounds like Reece Hart can

[GENERAL] text column indexing in UTF-8 database

2009-03-12 Thread Reece Hart
Do I really need 4 indexes per column to handle the 4 combinations of {equality,like/regexp} x {unfolded,casefolded} in a UTF-8 encoded database? I have a column that I'd like to be able to search with equality and regexp (or like), optionally casefolded. The database is UTF-8 encoded. The table

Re: [GENERAL] text column indexing in UTF-8 database

2009-03-12 Thread Steve Atkins
On Mar 12, 2009, at 5:15 PM, Reece Hart wrote: Do I really need 4 indexes per column to handle the 4 combinations of {equality,like/regexp} x {unfolded,casefolded} in a UTF-8 encoded database? I have a column that I'd like to be able to search with equality and regexp (or like), optionally

Re: [GENERAL] text column indexing in UTF-8 database

2009-03-12 Thread Jeff Davis
On Thu, 2009-03-12 at 17:15 -0700, Reece Hart wrote: Jeff Davis gave me a tip to use text_pattern_ops on indexes to speed up regexp and like; that worked beautiful. But I discovered a caveat that t_p_o apparently doesn't handle equality. Thus, I think I need distinct indexes for the 4 cases

Re: [GENERAL] text column indexing in UTF-8 database

2009-03-12 Thread Reece Hart
On Thu, 2009-03-12 at 18:02 -0700, Jeff Davis wrote: It looks like an index using text_pattern_ops can be used for equality (see my test case below). Odd. I can't reproduce your test case. I noticed that I edited out the version and platform from my OP. (A: 8.3.6, x86_64 linux). You're on

Re: [GENERAL] text column indexing in UTF-8 database

2009-03-12 Thread Reece Hart
On Thu, 2009-03-12 at 17:32 -0700, Steve Atkins wrote: If A=B then lower(A) = lower(B), and if A like B then lower(A) like lower(B). So, if nothing else, you could rewrite where alias = 'Foo' as where lower(alias) = lower('Foo') and alias='Foo' and take advantage of the lower()