[HACKERS] Re: Patch: regexp_matches variant returning an array of matching positions
Alvaro Herrera-9 wrote Björn Harrtell wrote: I've written a variant of regexp_matches called regexp_matches_positions which instead of returning matching substrings will return matching positions. I found use of this when processing OCR scanned text and wanted to prioritize matches based on their position. Interesting. I didn't read the patch but I wonder if it would be of more general applicability to return more info in a fell swoop a function returning a set (position, length, text of match), rather than an array. So instead of first calling one function to get the match and then their positions, do it all in one pass. (See pg_event_trigger_dropped_objects for a simple example of a function that returns in that fashion. There are several others but AFAIR that's the simplest one.) Confused as to your thinking. Like regexp_matches this returns SETOF type[]. In this case integer but text for the matches. I could see adding a generic function that returns a SETOF named composite (match varchar[], position int[], length int[]) and the corresponding type. I'm not imagining a situation where you'd want the position but not the text and so having to evaluate the regexp twice seems wasteful. The length is probably a waste though since it can readily be gotten from the text and is less often needed. But if it's pre-calculated anyway... My question is what position is returned in a multiple-match situation? The supplied test only covers the simple, non-global, situation. It needs to exercise empty sub-matches and global searches. One theory is that the first array slot should cover the global position of match zero (i.e., the entire pattern) within the larger document while sub-matches would be relative offsets within that single match. This conflicts, though, with the fact that _matches only returns array elements for () items and never for the full match - the goal in this function being parallel un-nesting. But as nesting is allowed it is still possible to have occur. How does this resolve in the patch? SELECT regexp_matches('abcabc','((a)(b)(c))','g'); David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Patch-regexp-matches-variant-returning-an-array-of-matching-positions-tp5789321p5789414.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Patch: regexp_matches variant returning an array of matching positions
On Wed, January 29, 2014 05:16, David Johnston wrote: How does this resolve in the patch? SELECT regexp_matches('abcabc','((a)(b)(c))','g'); With the patch: testdb=# SELECT regexp_matches('abcabc','((a)(b)(c))','g'), regexp_matches_positions('abcabc','((a)(b)(c))'); regexp_matches | regexp_matches_positions +-- {abc,a,b,c}| {1,1,2,3} {abc,a,b,c}| {1,1,2,3} (2 rows) testdb=# SELECT regexp_matches('abcabc','((a)(b)(c))','g'), regexp_matches_positions('abcabc','((a)(b)(c))', 'g'); regexp_matches | regexp_matches_positions +-- {abc,a,b,c}| {1,1,2,3} {abc,a,b,c}| {4,4,5,6} (2 rows) ( in HEAD: testdb=# SELECT regexp_matches('abcabc','((a)(b)(c))','g'); regexp_matches {abc,a,b,c} {abc,a,b,c} (2 rows) ) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Patch: regexp_matches variant returning an array of matching positions
Erik Rijkers wrote On Wed, January 29, 2014 05:16, David Johnston wrote: How does this resolve in the patch? SELECT regexp_matches('abcabc','((a)(b)(c))','g'); With the patch: testdb=# SELECT regexp_matches('abcabc','((a)(b)(c))','g'), regexp_matches_positions('abcabc','((a)(b)(c))'); regexp_matches | regexp_matches_positions +-- {abc,a,b,c}| {1,1,2,3} {abc,a,b,c}| {1,1,2,3} (2 rows) The {1,1,2,3} in the second row is an artifact/copy from set-value-function-in-select-list repetition and has nothing to do with the second match. testdb=# SELECT regexp_matches('abcabc','((a)(b)(c))','g'), regexp_matches_positions('abcabc','((a)(b)(c))', 'g'); regexp_matches | regexp_matches_positions +-- {abc,a,b,c}| {1,1,2,3} {abc,a,b,c}| {4,4,5,6} (2 rows) As expected. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Patch-regexp-matches-variant-returning-an-array-of-matching-positions-tp5789321p5789434.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers