It would have been really nice to define fn:analyze-string() so that it would capture multiple matches of the capturing groups, but we made a policy decision that as far as possible it should be possible to implement the XPath/XQuery regex facilities using existing regex libraries, and sadly they generally do not have this capability.
But in any case I think a multi-pass approach is probably appropriate here. In fact generally, I think the approach of trying to do everything in one great complex regular expression is usually misguided. Splitting it up into smaller steps not only makes the logic easier to understand (and therefore to debug and maintain), it can also benefit performance. So: (a) use analyze-string to mark any substring comprising "(" followed by digits, spaces, and commas, followed by ")" (b) use tokenize to split out the individual numbers. Michael Kay Saxonica > On 23 Apr 2018, at 17:22, Joe Wicentowski <joe...@gmail.com> wrote: > > Hi all, > > I have encountered an unexpected challenge constructing a regex for a pattern > I am looking for. I am looking for numbers in parentheses. For example, in > the following string: > > "On February 13, 1968, Secretary of State Dean Rusk sent a > message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, > urging him to seize the unique opportunity offered by the Jarring > mission to achieve peace. (79, 171)" > > ... I would like to match "79" and "171" (but not "UAR" or "13" or "1968"). > I have been trying to construct a regex for use with analyze-string to > capture this pattern, but I have not been successful. I have tried the > following: > > analyze-string($string, "(?:\()(?:(\d+)(?:, )?)+(?:\))") > > In other words, there are these 3 components: > > 1. (?:\() a non-capturing group consisting of an open parens, followed by > 2. (?:(\d+)(?:, )?)+ one or more non-capturing groups consisting of (a > number followed by an optional, non-matching comma-and-space), followed by > 3. (?:\)) a non-capturing group consisting of a close parens > > I was expecting to get the following output: > > <fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions > <http://www.w3.org/2005/xpath-functions>"> > <fn:non-match>On February 13, 1968, Secretary of State Dean Rusk sent a > message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, > urging him to seize the unique opportunity offered by the Jarring > mission to achieve peace. </fn:non-match> > <fn:match>(<fn:group nr="1">79</fn:group>, > <fn:group nr="1">171</fn:group>)</fn:match> > </fn:analyze-string-result> > > However, the actual result is that the first number ("79") is skipped, and > only the 2nd number ("171") is captured: > > <fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions > <http://www.w3.org/2005/xpath-functions>"> > <fn:non-match>On February 13, 1968, Secretary of State Dean Rusk sent a > message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, > urging him to seize the unique opportunity offered by the Jarring > mission to achieve peace. </fn:non-match> > <fn:match>(79, > <fn:group nr="1">171</fn:group>)</fn:match> > </fn:analyze-string-result> > > What am I missing? Can anyone suggest a regex that is able to capture both > numbers inside the parentheses? Or do I need to make a two-pass run through > this, finding parenthetical text with a first analyze-string like "\(.+\)" > and then looking inside its matches with a second analyze-string like > "(\d+)(?:, )?"? > > Thanks, > Joe > _______________________________________________ > talk@x-query.com > http://x-query.com/mailman/listinfo/talk
_______________________________________________ talk@x-query.com http://x-query.com/mailman/listinfo/talk