Alvaro Herrera-9 wrote > Björn Harrtell wrote: >> I've written a variant of regexp_matches called regexp_matches_positions >> which instead of returning matching substrings will return matching >> positions. I found use of this when processing OCR scanned text and >> wanted >> to prioritize matches based on their position. > > Interesting. I didn't read the patch but I wonder if it would be of > more general applicability to return more info in a fell swoop a > function returning a set (position, length, text of match), rather than > an array. So instead of first calling one function to get the match and > then their positions, do it all in one pass. > > (See pg_event_trigger_dropped_objects for a simple example of a function > that returns in that fashion. There are several others but AFAIR that's > the simplest one.)
Confused as to your thinking. Like regexp_matches this returns "SETOF type[]". In this case integer but text for the matches. I could see adding a generic function that returns a SETOF named composite (match varchar[], position int[], length int[]) and the corresponding type. I'm not imagining a situation where you'd want the position but not the text and so having to evaluate the regexp twice seems wasteful. The length is probably a waste though since it can readily be gotten from the text and is less often needed. But if it's pre-calculated anyway... My question is what position is returned in a multiple-match situation? The supplied test only covers the simple, non-global, situation. It needs to exercise empty sub-matches and global searches. One theory is that the first array slot should cover the global position of match zero (i.e., the entire pattern) within the larger document while sub-matches would be relative offsets within that single match. This conflicts, though, with the fact that _matches only returns array elements for () items and never for the full match - the goal in this function being parallel un-nesting. But as nesting is allowed it is still possible to have occur. How does this resolve in the patch? SELECT regexp_matches('abcabc','((a)(b)(c))','g'); David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Patch-regexp-matches-variant-returning-an-array-of-matching-positions-tp5789321p5789414.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers