Re: [HACKERS] PATCH: CITEXT 2.0 v4
On Jul 18, 2008, at 01:39, Michael Paesold wrote: Calling regex functions with the case-insensitivity option would be great. It should also be possible to rewrite replace() into regexp_replace() by first escaping the regex meta characters. Actually re-implementing those functions in a case insensitive way would still be an option, but of course some amount of work. The question is, how much use case there is. I've figured out how to make all the functions work using SQL function workarounds, converting things and re-dispatching to the text versions as appropriate. They work quite well, and can be converted to C later if that becomes a requirement. Meanwhile, on the question of whether or not regular expression and LIKE comparisons *should* match case-insensitively, I have a couple more observations: * Thinking about how a true case-insensitive collation would work, I'm quite certain that it would match case-insensitively. Anything else would just be unexpected, because in a case-insensitive collation, lowercase characters are, in practice, identical to uppercase characters. As far as matching is concerned, there is no difference between them. So the matching operators and functions against CITEXT should follow that assumption. * I tried a few matches on MySQL, where the collation is case- insensitive by default, and it confirms my impression: mysql select 'Foo' regexp 'o$'; +---+ | 'Foo' regexp 'o$' | +---+ | 1 | +---+ 1 row in set (0.00 sec) mysql select 'Foo' regexp 'O$'; +---+ | 'Foo' regexp 'O$' | +---+ | 1 | +---+ 1 row in set (0.00 sec) mysql select 'Foo' like '%o'; +-+ | 'Foo' like '%o' | +-+ | 1 | +-+ 1 row in set (0.00 sec) mysql select 'Foo' like '%O'; +-+ | 'Foo' like '%O' | +-+ | 1 | +-+ 1 row in set (0.00 sec) I'll grant that MySQL may not be the best model for how things should work, but it's something, at least. Anyone else got access to another database with case-insensitive collations to see how LIKE and regular expressions work? Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v4
On Jul 18, 2008, at 09:53, David E. Wheeler wrote: However, if someone with a lot more C and Pg core knowledge wanted to sit down with me for a couple hours next week and help me bang out these functions, that would be great. I'd love to have the implementation be that much more complete. I've implemented fixes for the regexp_* functions and strpos() in pure SQL, like so: CREATE OR REPLACE FUNCTION regexp_matches( citext, citext ) RETURNS TEXT[] AS ' SELECT regexp_matches( $1::text, $2::text, ''i'' ); ' LANGUAGE SQL IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION regexp_matches( citext, citext, text ) RETURNS TEXT[] AS ' SELECT regexp_matches( $1::text, $2::text, CASE WHEN strpos($3, ''c'') = 0 THEN $3 || ''i'' ELSE $3 END ); ' LANGUAGE SQL IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION regexp_replace( citext, citext, text ) returns TEXT AS ' SELECT regexp_replace( $1::text, $2::text, $3, ''i''); ' LANGUAGE SQL IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION regexp_replace( citext, citext, text, text ) returns TEXT AS ' SELECT regexp_replace( $1::text, $2::text, $3, CASE WHEN strpos($4, ''c'') = 0 THEN $4 || ''i'' ELSE $4 END); ' LANGUAGE SQL IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION regexp_split_to_array( citext, citext ) RETURNS TEXT[] AS ' SELECT regexp_split_to_array( $1::text, $2::text, ''i'' ); ' LANGUAGE SQL IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION regexp_split_to_array( citext, citext, text ) RETURNS TEXT[] AS ' SELECT regexp_split_to_array( $1::text, $2::text, CASE WHEN strpos($3, ''c'') = 0 THEN $3 || ''i'' ELSE $3 END ); ' LANGUAGE SQL IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION regexp_split_to_table( citext, citext ) RETURNS SETOF TEXT AS ' SELECT regexp_split_to_table( $1::text, $2::text, ''i'' ); ' LANGUAGE SQL IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION regexp_split_to_table( citext, citext, text ) RETURNS SETOF TEXT AS ' SELECT regexp_split_to_table( $1::text, $2::text, CASE WHEN strpos($3, ''c'') = 0 THEN $3 || ''i'' ELSE $3 END ); ' LANGUAGE SQL IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION strpos( citext, citext ) RETURNS INT AS ' SELECT strpos( LOWER( $1::text ), LOWER( $2::text ) ); ' LANGUAGE SQL IMMUTABLE STRICT; Not so bad, though it'd be nice to have C functions that just did these things. Still not case-insensitive are: -- replace() -- split_part() -- translate() So, anyone at OSCON this week want to help me with these? Or to convert the above functions to C? Greg? Bruce? Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v4
David E. Wheeler writes: On Jul 17, 2008, at 03:45, Michael Paesold wrote: Wouldn't it be possible to create a variant of regexp_replace, i.e. regexp_replace(citext,citext,text), which would again lower-case the first two arguments before passing the input to regexp_replace(text,text,text)? Sure, but then you end up with this: template1=# select regexp_replace( 'Fxx'::citext, 'X'::citext, 'o'); regexp_replace foo (1 row) Yeah, you are right, I see. :-) Which is just wrong. I'm going to look at the regex C functions today and see if there's an easy way to just always pass them the 'i' flag, which would do the trick. That still won't help replace(), split_part(), or translate(), however. Calling regex functions with the case-insensitivity option would be great. It should also be possible to rewrite replace() into regexp_replace() by first escaping the regex meta characters. Actually re-implementing those functions in a case insensitive way would still be an option, but of course some amount of work. The question is, how much use case there is. Best Regards Michael Paesold -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v4
On Jul 18, 2008, at 01:39, Michael Paesold wrote: Calling regex functions with the case-insensitivity option would be great. It should also be possible to rewrite replace() into regexp_replace() by first escaping the regex meta characters. Actually re-implementing those functions in a case insensitive way would still be an option, but of course some amount of work. The question is, how much use case there is. Not much for me. I might use the regex functions, but would be happy to manually pass the i flag. However, if someone with a lot more C and Pg core knowledge wanted to sit down with me for a couple hours next week and help me bang out these functions, that would be great. I'd love to have the implementation be that much more complete. I do believe that, as it stands now in the v4 patch, citext is pretty close to ready, and certainly commit-able. Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v4
Am 16.07.2008 um 20:38 schrieb David E. Wheeler: The trouble is that, right now: template1=# select regexp_replace( 'fxx'::citext, 'X'::citext, 'o'); regexp_replace fxx (1 row) So there's an inconsistency there. I don't know how to make that work case-insensitively. Wouldn't it be possible to create a variant of regexp_replace, i.e. regexp_replace(citext,citext,text), which would again lower-case the first two arguments before passing the input to regexp_replace(text,text,text)? Best Regards Michael Paesold -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v4
On Jul 17, 2008, at 03:45, Michael Paesold wrote: Wouldn't it be possible to create a variant of regexp_replace, i.e. regexp_replace(citext,citext,text), which would again lower-case the first two arguments before passing the input to regexp_replace(text,text,text)? Sure, but then you end up with this: template1=# select regexp_replace( 'Fxx'::citext, 'X'::citext, 'o'); regexp_replace foo (1 row) Which is just wrong. I'm going to look at the regex C functions today and see if there's an easy way to just always pass them the 'i' flag, which would do the trick. That still won't help replace(), split_part(), or translate(), however. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PATCH: CITEXT 2.0 v4
Howdy, I've attached a new patch with the latest revisions of for the citext contrib module patch. The changes include: * Using strlen() to pass string lengths to the comparison function, since lowercasing the value can change the length. Per Tom Lane. * Made citextcmp consistently return int32, per Tom Lane. * Made the hash index function return the proper value, per Tom Lane. * Removed the COMMENTs and GRANTs from citext.sql.in. * Added a cast function from bpchar to citext, as suggested by Tom Lane. * Set the storage type for CITEXT to extended, to ensure that it will be toastable. Per Tom Lane. * Fixed the COMMUTATOR of =. * Changed the cast from citext to bpchar from implicit to assignment. This eliminates ambiguous function resolutions. * Eliminated superflous functions, per Tom Lane. * Removed unnecessary `OPERATOR()` calls in NEGATORs and the like. * Added binary in/out functions. Per Tom Lane * Added an explicit shell type to make the output a bit quieter. * Converted tests to pure SQL and omitted multibyte tests (though a few remain commented-out). * Reorganized and expanded the documentation a bit. This version is far better than I started with, and I'm very grateful for the feedback. Now, I have a few remaining questions to ask, mostly just to get your opinions: * The README for citext 1.0 on pgFoundry says: I had to make a decision on casting between types for regular expressions and decided that if any parameter is of citext type then case insensitive applies. For example applying regular expressions with a varchar and a citext will produce a case-insensitive result. Having thought about this afterwards I realised that since we have the option to use case-insensitive results with regular expressions I should have left the behaviour exactly as text and then you have the best of both worlds... oh well not hard to change for any of you perfectionists! I followed the original and made all the regex and LIKE comparisons case-insensitive. But maybe I should not have? Especially since the regular expression functions (e.g., regexp_replace()) and a few non- regex functions (e.g., replace()) still don't behave case-insensitively? * If the answer is no, how can I make those functions behave case- insensitively? (See the TODO tests.) * Should there be any other casts? To and from name, perhaps? Thanks! David citext4.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v4
On Jul 15, 2008, at 22:23, David E. Wheeler wrote: * The README for citext 1.0 on pgFoundry says: I had to make a decision on casting between types for regular expressions and decided that if any parameter is of citext type then case insensitive applies. For example applying regular expressions with a varchar and a citext will produce a case-insensitive result. Having thought about this afterwards I realised that since we have the option to use case-insensitive results with regular expressions I should have left the behaviour exactly as text and then you have the best of both worlds... oh well not hard to change for any of you perfectionists! I followed the original and made all the regex and LIKE comparisons case-insensitive. But maybe I should not have? Especially since the regular expression functions (e.g., regexp_replace()) and a few non- regex functions (e.g., replace()) still don't behave case- insensitively? I was thinking about this a bit last night and wanted to fill things out a bit. As a programmer, I find Donald Fraser's hindsight to be more appealing, because at least that way I have the option to do matching against CITEXT strings case-sensitively when I want to. OTOH, if what we want is to have CITEXT work more like a case- insensitive collation, then the expectation from the matching operators and functions might be different. Does anyone have any idea whether regex and LIKE matching against a string in a case-insensitive collation would match case-insensitively or not? If so, then maybe the regex and LIKE ops and funcs *should* match case-insensitively? If not, or if only for some collations, then I would think not. Either way, I know of no way, currently, to allow functions like replace(), split_part(), strpos(), and translate() to match case- insensitiely, even if we wanted to. Anyone have any ideas? * If the answer is no, how can I make those functions behave case- insensitively? (See the TODO tests.) * Should there be any other casts? To and from name, perhaps? Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v4
On Wednesday 16 July 2008 13:54:25 David E. Wheeler wrote: On Jul 15, 2008, at 22:23, David E. Wheeler wrote: * The README for citext 1.0 on pgFoundry says: I had to make a decision on casting between types for regular expressions and decided that if any parameter is of citext type then case insensitive applies. For example applying regular expressions with a varchar and a citext will produce a case-insensitive result. Having thought about this afterwards I realised that since we have the option to use case-insensitive results with regular expressions I should have left the behaviour exactly as text and then you have the best of both worlds... oh well not hard to change for any of you perfectionists! I followed the original and made all the regex and LIKE comparisons case-insensitive. But maybe I should not have? Especially since the regular expression functions (e.g., regexp_replace()) and a few non- regex functions (e.g., replace()) still don't behave case- insensitively? I was thinking about this a bit last night and wanted to fill things out a bit. As a programmer, I find Donald Fraser's hindsight to be more appealing, because at least that way I have the option to do matching against CITEXT strings case-sensitively when I want to. OTOH, if what we want is to have CITEXT work more like a case- insensitive collation, then the expectation from the matching operators and functions might be different. Does anyone have any idea whether regex and LIKE matching against a string in a case-insensitive collation would match case-insensitively or not? If so, then maybe the regex and LIKE ops and funcs *should* match case-insensitively? If not, or if only for some collations, then I would think not. Either way, I know of no way, currently, to allow functions like replace(), split_part(), strpos(), and translate() to match case- insensitiely, even if we wanted to. Anyone have any ideas? * If the answer is no, how can I make those functions behave case- insensitively? (See the TODO tests.) * Should there be any other casts? To and from name, perhaps? AIUI, your propsing the following: select 'x'::citext = 'X'::citext; ?column? -- t (1 row) select 'x'::citext ~ 'X'::citext; ?column? -- f (1 row) I understand the desire for flexibility, but the above seems wierd to me. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v4
On Jul 16, 2008, at 11:20, Robert Treat wrote: I was thinking about this a bit last night and wanted to fill things out a bit. As a programmer, I find Donald Fraser's hindsight to be more appealing, because at least that way I have the option to do matching against CITEXT strings case-sensitively when I want to. OTOH, if what we want is to have CITEXT work more like a case- insensitive collation, then the expectation from the matching operators and functions might be different. Does anyone have any idea whether regex and LIKE matching against a string in a case- insensitive collation would match case-insensitively or not? If so, then maybe the regex and LIKE ops and funcs *should* match case-insensitively? If not, or if only for some collations, then I would think not. Either way, I know of no way, currently, to allow functions like replace(), split_part(), strpos(), and translate() to match case- insensitiely, even if we wanted to. Anyone have any ideas? * If the answer is no, how can I make those functions behave case- insensitively? (See the TODO tests.) * Should there be any other casts? To and from name, perhaps? AIUI, your propsing the following: select 'x'::citext = 'X'::citext; ?column? -- t (1 row) select 'x'::citext ~ 'X'::citext; ?column? -- f (1 row) I understand the desire for flexibility, but the above seems wierd to me. That's what Donald Fraser suggested, and I see some value in that, but wanted to get some other opinions. And you're right, that does seem a bit weird. The trouble is that, right now: template1=# select regexp_replace( 'fxx'::citext, 'X'::citext, 'o'); regexp_replace fxx (1 row) So there's an inconsistency there. I don't know how to make that work case-insensitively. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers