On Thu, 15 Feb 2007, Peter Eisentraut wrote: > Neil Conway wrote: > > On Wed, 2007-02-14 at 16:49 -0800, Jeremy Drake wrote: > > > What was the status of this? Was there anything else I needed to > > > do with this patch, or is it ready to be applied? Let me know if > > > there is anything else I need to do on this... > > > > Will do -- I'm planning to apply this as soon as I have the free > > cycles to do so, likely tomorrow or Friday. > > I don't know which patch is actually being proposed now. It would be > good to make this more explicit and maybe include a synopsis of the > functions in the email, so we know what's going on.
Sorry, my intent was just to check to see if I had gotten the patch sufficiently fixed for Neil to apply and he just hadn't gotten to it yet (which seems to be the case), or if there was something else he still expected me to fix that I had missed in the prior discussions. I suppose I should have emailed him privately. The patch in question can be seen in the archives here: http://archives.postgresql.org/pgsql-patches/2007-02/msg00214.php The functions added are: * regexp_split(str text, pattern text) RETURNS SETOF text regexp_split(str text, pattern text, flags text) RETURNS SETOF text returns each section of the string delimited by the pattern. * regexp_matches(str text, pattern text) RETURNS text[] returns all capture groups when matching pattern against string in an array * regexp_matches(str text, pattern text, flags text) RETURNS SETOF (prematch text, fullmatch text, matches text[], postmatch text) returns all capture groups when matching pattern against string in an array. also returns the entire match in fullmatch. if the 'g' option is given, returns all matches in the string. if the 'r' option is given, also return the text before and after the match in prematch and postmatch respectively. > What confuses me about some of the functions I've seen in earlier > patches in this thread is that they return setof something. But in my > mind, regular expression matches or string splits are inherently > ordered, so an array would be the correct return type. They do return SETOF. Addressing them separately: regexp_matches uses a text[] for the match groups. If you specify the global flag, it could return multiple matches. Couple this with the requested feature of pre- and postmatch returns (with its own flag) and the return would turn into some sort of nasty array of tuples, or multiple arrays. It seems much cleaner to me to return a set of the matches found, and I find which order the matches occur in to be much less important than the fact that they did occur and their contents. regexp_split returns setof text. This has, in my opinion, a much greater case to return an array. However, there are several issues with this approach: # My experience with the array code leads me to believe that building up an array is an expensive proposition. I know I could code it smarter so that the array is only constructed in the end. # With a set-returning function, it is possible to add a LIMIT clause, to prevent splitting up more of the string than is necessary. It is also immediately possible to insert them into a table, or do grouping on them, or call a function on each value. Most of the time when I do a split, I intend to do something like this with each split value. # You can still get an array if you really want it: #* SELECT ARRAY(SELECT * FROM regexp_split('first, second, third', E',\\s*')) -- No problem is so formidable that you can't just walk away from it. -- C. Schulz ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate