Re: [HACKERS] PATCH: CITEXT 2.0 v2
On Jul 7, 2008, at 12:06, David E. Wheeler wrote: I understand it but there is parallel project which should solve this problem completely I guess in "close" future (2-3years). Afterward this module will be obsolete and it will takes times to remove it from contrib. It seems to me that have citext in contrib only for two releases is not optimal solution. I guess that'd be the reason to keep it on pgFoundry, but I have two comments: * 2-3 years is a *long* time in Internet time. * There is on guarantee that it will be finished in that time or, indeed ever (no disrespect to Radek Strnad, it's just there are always unforeseen issues). One other thing that occurred to me yesterday: Given that the feature will ultimately support column-level collations, I suspect that it will be much easier to migrate CITEXT to a case-insensitive collation (perhaps using an updated CITEXT contrib module that just does so transparently) than to migrate application code from using LOWER() all over the place to not using. One transition requires a change to the schema, the other requires a change to application code, of which there is generally a lot more. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
Martijn van Oosterhout napsal(a): On Mon, Jul 07, 2008 at 12:06:08PM -0700, David E. Wheeler wrote: I guess that'd be the reason to keep it on pgFoundry, but I have two comments: * 2-3 years is a *long* time in Internet time. There have been patches over the years, but they tend not to get looked at. Recently someone pulled up the COLLATE patch from a couple of years ago but it didn't get much feedback either. (I can't find the link right now). I know about it. I have printed your proposal on my desk. I think It is linked from TODO list. It's disappointing that the discussions get hung up on the ICU library when it's not required or even needed for COLLATE support. My original patch never even mentioned it. I note that Firebird added COLLATE using ICU a few years back now. I think PostgreSQL is the only large DBMS to not support it. Complete agree. Collation missing support is big problem for many users. * There is on guarantee that it will be finished in that time or, indeed ever (no disrespect to Radek Strnad, it's just there are always unforeseen issues). I think that with concerted coding effort it could be done in 2-3 months (judging by how long it took to write the first version). The problem is it needs some planner kung-fu which not many people have. I agree that 2-3 months on fulltime is good estimation, problem is that you need kung-fu master which has time to do it :(. What we currently have is student which works on it in free time. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
On Mon, Jul 07, 2008 at 12:06:08PM -0700, David E. Wheeler wrote: > I guess that'd be the reason to keep it on pgFoundry, but I have two > comments: > > * 2-3 years is a *long* time in Internet time. There have been patches over the years, but they tend not to get looked at. Recently someone pulled up the COLLATE patch from a couple of years ago but it didn't get much feedback either. (I can't find the link right now). It's disappointing that the discussions get hung up on the ICU library when it's not required or even needed for COLLATE support. My original patch never even mentioned it. I note that Firebird added COLLATE using ICU a few years back now. I think PostgreSQL is the only large DBMS to not support it. > * There is on guarantee that it will be finished in that time or, > indeed ever (no disrespect to Radek Strnad, it's just there are always > unforeseen issues). I think that with concerted coding effort it could be done in 2-3 months (judging by how long it took to write the first version). The problem is it needs some planner kung-fu which not many people have. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] PATCH: CITEXT 2.0 v2
Thanks to help from RhodiumToad on IRC, I got some things improved here: On Jul 7, 2008, at 16:24, David E. Wheeler wrote: So for some reason, after adding the indexes, the queries against the CITEXT column aren't using them. Furthermore, the `lower(text) LIKE lower(?)` query isn't using *its* index. Huh? I never knew what one needed to use the text_pattern_ops operator class to index a column for use with LIKE! I had no clue. Would that work for a citext column, too, since it's essentially the same as TEXT? So this leaves me with two questions: 1. For what reason would the query against the citext column *not* use the index? It turns out that it did use the index if I put `SET enable_seqscan = false;` into my script. So with RhodiumToad's direction, I added some `RESTRICT` and `JOIN` clauses to my comparison operators (copying them from ip4r). So now I have: CREATE OPERATOR = ( LEFTARG= CITEXT, RIGHTARG = CITEXT, COMMUTATOR = =, NEGATOR= <>, PROCEDURE = citext_eq, RESTRICT = eqsel, JOIN = eqjoinsel, HASHES, MERGES ); CREATE OPERATOR <> ( LEFTARG= CITEXT, RIGHTARG = CITEXT, NEGATOR= =, COMMUTATOR = <>, PROCEDURE = citext_ne, RESTRICT = neqsel, JOIN = neqjoinsel ); CREATE OPERATOR < ( LEFTARG= CITEXT, RIGHTARG = CITEXT, NEGATOR= >=, COMMUTATOR = >, PROCEDURE = citext_lt, RESTRICT = scalarltsel, JOIN = scalarltjoinsel ); CREATE OPERATOR <= ( LEFTARG= CITEXT, RIGHTARG = CITEXT, NEGATOR= >, COMMUTATOR = <=, PROCEDURE = citext_le, RESTRICT = scalarltsel, JOIN = scalarltjoinsel ); CREATE OPERATOR >= ( LEFTARG= CITEXT, RIGHTARG = CITEXT, NEGATOR= <, COMMUTATOR = <=, PROCEDURE = citext_ge, RESTRICT = scalargtsel, JOIN = scalargtjoinsel ); CREATE OPERATOR > ( LEFTARG= CITEXT, RIGHTARG = CITEXT, NEGATOR= <=, COMMUTATOR = <, PROCEDURE = citext_gt, RESTRICT = scalargtsel, JOIN = scalargtjoinsel ); With this change, the index was used: Loading words from dictionary. Inserting into the table. Test =. SELECT * FROM try WHERE LOWER(text) = LOWER('food'); Time: 261.295 ms SELECT * FROM try WHERE citext = 'food'; Time: 289.304 ms Time: 1228.961 ms Adding indexes... Test =. SELECT * FROM try WHERE LOWER(text) = LOWER('food'); Time: 2.018 ms SELECT * FROM try WHERE citext = 'food'; Time: 0.788 ms Seems to be faster than the LOWER() version, too, which makes me happy. The output from EXPLAIN ANALYZE: try=# EXPLAIN ANALYZE SELECT * FROM try WHERE citext = 'food'; QUERY PLAN -- Index Scan using idx_try_citext on try (cost=0.00..8.31 rows=1 width=119) (actual time=0.324..0.324 rows=0 loops=1) Index Cond: (citext = 'food'::citext) Total runtime: 0.377 ms (3 rows) try=# EXPLAIN ANALYZE SELECT * FROM try WHERE LOWER(text) = LOWER('food'); QUERY PLAN Bitmap Heap Scan on try (cost=28.17..1336.10 rows=500 width=119) (actual time=0.170..0.170 rows=0 loops=1) Recheck Cond: (lower(text) = 'food'::text) -> Bitmap Index Scan on idx_try_text (cost=0.00..28.04 rows=500 width=0) (actual time=0.168..0.168 rows=0 loops=1) Index Cond: (lower(text) = 'food'::text) Total runtime: 0.211 ms (5 rows) So my only other question related to this is: * Are the above RESTRICT and JOIN functions the ones to use, or is there some way to make use of those used by the TEXT type that would be more appropriate? 2. Is there some way to get the CITEXT index to behave like a LOWER() index, that is, so that its value is stored using the result of the str_tolower() function, thus removing some of the overhead of converting the values for each row fetched from the index? (Does this question make any sense?) Given the performance with an index, I think that this is moot, yes? There is, of course, much more overhead for a table scan. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
No, *really* Sheesh, sorry. David try.sql Description: Binary data On Jul 7, 2008, at 16:26, David E. Wheeler wrote: And here is the script. D'oh! Thanks, David On Jul 7, 2008, at 16:24, David E. Wheeler wrote: On Jul 7, 2008, at 08:01, Andrew Dunstan wrote: What does still bother me is its performance. I'd like to know if any measurement has been done of using citext vs. a functional index on lower(foo). Okay, here's a start. The attached script inserts random strings of 1-10 space-delimited words into text and citext columns, and then compares the performance of queries with and without indexes. The output for me is as follows: Loading words from dictionary. Inserting into the table. Test =. SELECT * FROM try WHERE LOWER(text) = LOWER('food'); Time: 254.254 ms SELECT * FROM try WHERE citext = 'food'; Time: 288.535 ms Test LIKE and ILIKE SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%'); Time: 209.385 ms SELECT * FROM try WHERE citext ILIKE 'C%'; Time: 236.186 ms SELECT * FROM try WHERE citext LIKE 'C%'; Time: 235.818 ms Adding indexes... Test =. SELECT * FROM try WHERE LOWER(text) = LOWER('food'); Time: 1.260 ms SELECT * FROM try WHERE citext = 'food'; Time: 277.755 ms Test LIKE and ILIKE SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%'); Time: 209.073 ms SELECT * FROM try WHERE citext ILIKE 'C%'; Time: 238.430 ms SELECT * FROM try WHERE citext LIKE 'C%'; Time: 238.685 ms benedict% So for some reason, after adding the indexes, the queries against the CITEXT column aren't using them. Furthermore, the `lower(text) LIKE lower(?)` query isn't using *its* index. Huh? So this leaves me with two questions: 1. For what reason would the query against the citext column *not* use the index? 2. Is there some way to get the CITEXT index to behave like a LOWER() index, that is, so that its value is stored using the result of the str_tolower() function, thus removing some of the overhead of converting the values for each row fetched from the index? (Does this question make any sense?) Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
And here is the script. D'oh! Thanks, David try.sql Description: Binary data On Jul 7, 2008, at 16:24, David E. Wheeler wrote: On Jul 7, 2008, at 08:01, Andrew Dunstan wrote: What does still bother me is its performance. I'd like to know if any measurement has been done of using citext vs. a functional index on lower(foo). Okay, here's a start. The attached script inserts random strings of 1-10 space-delimited words into text and citext columns, and then compares the performance of queries with and without indexes. The output for me is as follows: Loading words from dictionary. Inserting into the table. Test =. SELECT * FROM try WHERE LOWER(text) = LOWER('food'); Time: 254.254 ms SELECT * FROM try WHERE citext = 'food'; Time: 288.535 ms Test LIKE and ILIKE SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%'); Time: 209.385 ms SELECT * FROM try WHERE citext ILIKE 'C%'; Time: 236.186 ms SELECT * FROM try WHERE citext LIKE 'C%'; Time: 235.818 ms Adding indexes... Test =. SELECT * FROM try WHERE LOWER(text) = LOWER('food'); Time: 1.260 ms SELECT * FROM try WHERE citext = 'food'; Time: 277.755 ms Test LIKE and ILIKE SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%'); Time: 209.073 ms SELECT * FROM try WHERE citext ILIKE 'C%'; Time: 238.430 ms SELECT * FROM try WHERE citext LIKE 'C%'; Time: 238.685 ms benedict% So for some reason, after adding the indexes, the queries against the CITEXT column aren't using them. Furthermore, the `lower(text) LIKE lower(?)` query isn't using *its* index. Huh? So this leaves me with two questions: 1. For what reason would the query against the citext column *not* use the index? 2. Is there some way to get the CITEXT index to behave like a LOWER() index, that is, so that its value is stored using the result of the str_tolower() function, thus removing some of the overhead of converting the values for each row fetched from the index? (Does this question make any sense?) Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
On Jul 7, 2008, at 08:01, Andrew Dunstan wrote: What does still bother me is its performance. I'd like to know if any measurement has been done of using citext vs. a functional index on lower(foo). Okay, here's a start. The attached script inserts random strings of 1-10 space-delimited words into text and citext columns, and then compares the performance of queries with and without indexes. The output for me is as follows: Loading words from dictionary. Inserting into the table. Test =. SELECT * FROM try WHERE LOWER(text) = LOWER('food'); Time: 254.254 ms SELECT * FROM try WHERE citext = 'food'; Time: 288.535 ms Test LIKE and ILIKE SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%'); Time: 209.385 ms SELECT * FROM try WHERE citext ILIKE 'C%'; Time: 236.186 ms SELECT * FROM try WHERE citext LIKE 'C%'; Time: 235.818 ms Adding indexes... Test =. SELECT * FROM try WHERE LOWER(text) = LOWER('food'); Time: 1.260 ms SELECT * FROM try WHERE citext = 'food'; Time: 277.755 ms Test LIKE and ILIKE SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%'); Time: 209.073 ms SELECT * FROM try WHERE citext ILIKE 'C%'; Time: 238.430 ms SELECT * FROM try WHERE citext LIKE 'C%'; Time: 238.685 ms benedict% So for some reason, after adding the indexes, the queries against the CITEXT column aren't using them. Furthermore, the `lower(text) LIKE lower(?)` query isn't using *its* index. Huh? So this leaves me with two questions: 1. For what reason would the query against the citext column *not* use the index? 2. Is there some way to get the CITEXT index to behave like a LOWER() index, that is, so that its value is stored using the result of the str_tolower() function, thus removing some of the overhead of converting the values for each row fetched from the index? (Does this question make any sense?) Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
On Jul 7, 2008, at 12:10, Pavel Stehule wrote: Maybe we can have some "locale" test outside our regress tests - I think that would be useful. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
2008/7/7 Zdenek Kotala <[EMAIL PROTECTED]>: > David E. Wheeler napsal(a): >> >> On Jul 7, 2008, at 07:41, Zdenek Kotala wrote: >> >>> However, It seems to me that code is ok now (exclude citex_eq). I think >>> there two open problems/questions: >>> >>> 1) regression test - >>> >>> a) I think that regresion is not correct. It depends on en_US locale, >>> but it uses characters which is not in related character repertoire. It >>> means comparing is not defined and I guess it could generate different >>> result on different OS - depends on locale implementation. >> >> That I don't know about. The test requires en_US.UTF-8, at least at this >> point. How are tests run on the build farm? And how else could I ensure that >> comparisons are case-insensitive for non-ASCII characters other than >> requiring a Unicode locale? Or is it just an issue for the sort order tests? >> For those, I could potentially remove accented characters, just as long as >> I'm verifying in other tests that comparisons are case-insensitive (without >> worrying about collation). > > Hmm, it is complex area and I'm not sure if anybody on planet know correct > answer :-). I think that best approach is to keep it as is and when a > problem occur then it will be fixed. > Maybe we can have some "locale" test outside our regress tests - >>> b) pgTap is something new. Need make a decision if this framework is >>> acceptable or not. >> >> Well, from the point of view of `make installcheck`, it's invisible. I've >> submitted a talk proposal for pgDay.US on ptTAP. I'm happy to discuss it >> further here though, if folks are interested. > > Yeah, it is invisible, but question is why don't put it as a framework to > common place. But it is for another discussion. > >>> 2) contrib vs. pgFoundry >>> >>> There is unresolved answer if we want to have this in contrib or not. >>> Good to mention that citext type will be obsoleted with full collation >>> implementation in a future. I personally prefer to keep it on pgFoundry >>> because it is temporally workaround (by my opinion), but I can live with >>> contrib module as well. >> >> I second what Andrew has said in reply to this issue. I'll also say that, >> since people *so* often end up using `WHERE LOWER(col) = LOWER(?)`, that >> it'd be very valuable to have citext in contrib, especially since so few >> folks seem to even know about pgFoundry, let alone be able to find it. I >> mean, look at these search results: > > I understand it but there is parallel project which should solve this > problem completely I guess in "close" future (2-3years). Afterward this > module will be obsolete and it will takes times to remove it from contrib. > It seems to me that have citext in contrib only for two releases is not > optimal solution. > >Zdenek > > Regards Pavel > > -- > Zdenek Kotala Sun Microsystems > Prague, Czech Republic http://sun.com/postgresql > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
On Jul 7, 2008, at 11:54, Zdenek Kotala wrote: Hmm, it is complex area and I'm not sure if anybody on planet know correct answer :-). I think that best approach is to keep it as is and when a problem occur then it will be fixed. Regression tests are really important, though. b) pgTap is something new. Need make a decision if this framework is acceptable or not. Well, from the point of view of `make installcheck`, it's invisible. I've submitted a talk proposal for pgDay.US on ptTAP. I'm happy to discuss it further here though, if folks are interested. Yeah, it is invisible, but question is why don't put it as a framework to common place. But it is for another discussion. It is in a common place as a project, on pgFoundry. Whether the core team wants to use it for testing other parts of PostgreSQL (core or contrib) and therefore put it in a central location is, as you say, a separate conversation. It'd be easy to move it in such a case, of course. I understand it but there is parallel project which should solve this problem completely I guess in "close" future (2-3years). Afterward this module will be obsolete and it will takes times to remove it from contrib. It seems to me that have citext in contrib only for two releases is not optimal solution. I guess that'd be the reason to keep it on pgFoundry, but I have two comments: * 2-3 years is a *long* time in Internet time. * There is on guarantee that it will be finished in that time or, indeed ever (no disrespect to Radek Strnad, it's just there are always unforeseen issues). Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
Andrew Dunstan napsal(a): Zdenek Kotala wrote: 2) contrib vs. pgFoundry There is unresolved answer if we want to have this in contrib or not. Good to mention that citext type will be obsoleted with full collation implementation in a future. I personally prefer to keep it on pgFoundry because it is temporally workaround (by my opinion), but I can live with contrib module as well. Is there a concrete plan to get to full collation support (i.e. per column) in any known time frame? If not, then I think a citext module would be acceptable. Collation per database should be part of 8.4. Radek Strnad works on it as a SoC project. He plans to continue on this work and implement full support as his Master of Thesis in 2-3 years time frame. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
David E. Wheeler napsal(a): On Jul 7, 2008, at 07:41, Zdenek Kotala wrote: However, It seems to me that code is ok now (exclude citex_eq). I think there two open problems/questions: 1) regression test - a) I think that regresion is not correct. It depends on en_US locale, but it uses characters which is not in related character repertoire. It means comparing is not defined and I guess it could generate different result on different OS - depends on locale implementation. That I don't know about. The test requires en_US.UTF-8, at least at this point. How are tests run on the build farm? And how else could I ensure that comparisons are case-insensitive for non-ASCII characters other than requiring a Unicode locale? Or is it just an issue for the sort order tests? For those, I could potentially remove accented characters, just as long as I'm verifying in other tests that comparisons are case-insensitive (without worrying about collation). Hmm, it is complex area and I'm not sure if anybody on planet know correct answer :-). I think that best approach is to keep it as is and when a problem occur then it will be fixed. b) pgTap is something new. Need make a decision if this framework is acceptable or not. Well, from the point of view of `make installcheck`, it's invisible. I've submitted a talk proposal for pgDay.US on ptTAP. I'm happy to discuss it further here though, if folks are interested. Yeah, it is invisible, but question is why don't put it as a framework to common place. But it is for another discussion. 2) contrib vs. pgFoundry There is unresolved answer if we want to have this in contrib or not. Good to mention that citext type will be obsoleted with full collation implementation in a future. I personally prefer to keep it on pgFoundry because it is temporally workaround (by my opinion), but I can live with contrib module as well. I second what Andrew has said in reply to this issue. I'll also say that, since people *so* often end up using `WHERE LOWER(col) = LOWER(?)`, that it'd be very valuable to have citext in contrib, especially since so few folks seem to even know about pgFoundry, let alone be able to find it. I mean, look at these search results: I understand it but there is parallel project which should solve this problem completely I guess in "close" future (2-3years). Afterward this module will be obsolete and it will takes times to remove it from contrib. It seems to me that have citext in contrib only for two releases is not optimal solution. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
On Jul 7, 2008, at 08:01, Andrew Dunstan wrote: What does still bother me is its performance. I'd like to know if any measurement has been done of using citext vs. a functional index on lower(foo). That's a good question. I can whip up a quick test by populating a column full of data and trying both, but I'm no performance testing expert. Anyone else know more about such performance testing who can help out? Many Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
On Jul 7, 2008, at 07:41, Zdenek Kotala wrote: However, It seems to me that code is ok now (exclude citex_eq). I think there two open problems/questions: 1) regression test - a) I think that regresion is not correct. It depends on en_US locale, but it uses characters which is not in related character repertoire. It means comparing is not defined and I guess it could generate different result on different OS - depends on locale implementation. That I don't know about. The test requires en_US.UTF-8, at least at this point. How are tests run on the build farm? And how else could I ensure that comparisons are case-insensitive for non-ASCII characters other than requiring a Unicode locale? Or is it just an issue for the sort order tests? For those, I could potentially remove accented characters, just as long as I'm verifying in other tests that comparisons are case-insensitive (without worrying about collation). b) pgTap is something new. Need make a decision if this framework is acceptable or not. Well, from the point of view of `make installcheck`, it's invisible. I've submitted a talk proposal for pgDay.US on ptTAP. I'm happy to discuss it further here though, if folks are interested. 2) contrib vs. pgFoundry There is unresolved answer if we want to have this in contrib or not. Good to mention that citext type will be obsoleted with full collation implementation in a future. I personally prefer to keep it on pgFoundry because it is temporally workaround (by my opinion), but I can live with contrib module as well. I second what Andrew has said in reply to this issue. I'll also say that, since people *so* often end up using `WHERE LOWER(col) = LOWER(?)`, that it'd be very valuable to have citext in contrib, especially since so few folks seem to even know about pgFoundry, let alone be able to find it. I mean, look at these search results: http://www.google.com/search?q=PostgreSQL%20case-insensitive%20text My blog entry about this patch is hit #3. pgFoundry (and CITEXT 1) is #7. Last time I did a query like this, it didn't turn up at all. Belive me, I'd love for pgFoundry (or something like it) to become the CPAN for PostgreSQL. But without some love and SEO, I don't think that's gonna happen. Besides, CITEXT 2 would be a PITA to maintain for both 8.3 and 8.4, given the changes in the string comparison API in CVS. Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
Zdenek Kotala wrote: 2) contrib vs. pgFoundry There is unresolved answer if we want to have this in contrib or not. Good to mention that citext type will be obsoleted with full collation implementation in a future. I personally prefer to keep it on pgFoundry because it is temporally workaround (by my opinion), but I can live with contrib module as well. Is there a concrete plan to get to full collation support (i.e. per column) in any known time frame? If not, then I think a citext module would be acceptable. What does still bother me is its performance. I'd like to know if any measurement has been done of using citext vs. a functional index on lower(foo). cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
David E. Wheeler napsal(a): On Jun 27, 2008, at 18:22, David E. Wheeler wrote: Please find attached a patch adding a locale-aware, case-insensitive text type, called citext, as a contrib module. Here is a new version of the patch, with the following changes: * Fixed formatting to be more like core. * Added appropriate NEGATORs to operators. * Removed NEGATOR from the || operator. * Added hash index function and operator class. * The = operator now supports HASHES and MERGES. * citext_cmp and citextcmp both return int32. * Changed // comments to /* comments */. * Added test confirming láska'::citext <> 'laská'::citext. * A few other organizational, formatting, and pasto fixes. * Updated the FAQ entry on case-insensitive queries to recommend citext (it would, of course, need to be translated). Stuff I was asked about but didn't change: * citext_cmp() still uses varstr_cmp() instead of strncmp(). When I tried the latter, everything seemed to be equivalent. citext_eq (operator =) should use strncmp and citext_cmp() should use varstr_cmp(). * citext_smaller() and citext_larger() don't have memory leaks, says Tom, so I added no calls to PG_FREE_IF_COPY(). Yeah, it is correct. FGMR does clean job for you. However, It seems to me that code is ok now (exclude citex_eq). I think there two open problems/questions: 1) regression test - a) I think that regresion is not correct. It depends on en_US locale, but it uses characters which is not in related character repertoire. It means comparing is not defined and I guess it could generate different result on different OS - depends on locale implementation. b) pgTap is something new. Need make a decision if this framework is acceptable or not. 2) contrib vs. pgFoundry There is unresolved answer if we want to have this in contrib or not. Good to mention that citext type will be obsoleted with full collation implementation in a future. I personally prefer to keep it on pgFoundry because it is temporally workaround (by my opinion), but I can live with contrib module as well. Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers