On 07/13/2012 12:33 PM, Philip Hazel wrote:
On Thu, 12 Jul 2012, Ahmad Amireh wrote:

I'm having trouble figuring out how to capture duplicate named subpatterns (as
allowed by the PCRE_DUPNAMES option). My initial understanding from reading
the manual was that I could refer to a subpattern(s) capture using a name
instead of the capture order number. While that is true, it seems the named
subpattern capture always refers to the _last_ branch in which it was
_defined_ and not necessarily matched. The example in the manual is almost
exactly like what I'm trying to do so I'll just use it to explain:

(?J)(?<DN>Mon|Fri|Sun)(?:day)?|(?<DN>Tue)(?:sday)?|(?<DN>Wed)(?:nesday)?|(?<DN>Thu)(?:rsday)?|(?<DN>Sat)(?:urday)?
This example works for me when I test it using the pcretest program:

PCRE version 8.31 2012-07-06

/(?J)(?<DN>Mon|Fri|Sun)(?:day)?|(?<DN>Tue)(?:sday)?|(?<DN>Wed)(?:nesday)?|(?<DN>Thu)(?:rsday)?|(?<DN>Sat)(?:urday)?/
   Monday\CDN
  0: Monday
  1: Mon
   C Mon (3) DN
   Tuesday\CDN
  0: Tuesday
  1: <unset>
  2: Tue
   C Tue (3) DN
   Wednesday\CDN
  0: Wednesday
  1: <unset>
  2: <unset>
  3: Wed
   C Wed (3) DN
   Thursday\CDN
  0: Thursday
  1: <unset>
  2: <unset>
  3: <unset>
  4: Thu
   C Thu (3) DN
   Friday\CDN
  0: Friday
  1: Fri
   C Fri (3) DN
   Saturday\CDN
  0: Saturday
  1: <unset>
  2: <unset>
  3: <unset>
  4: <unset>
  5: Sat
   C Sat (3) DN

The \CDN option on the data lines means "use pcre_copy_named_substring
to collect the value of substring DN after the match". The same test
works with \GDN (using pcre_get_named_substring).

How are you trying to extract the named substring? The two functions
mentioned above return the first substring with the given name that is
actually set.

Philip

On 07/13/2012 12:33 PM, Philip Hazel wrote:
On Thu, 12 Jul 2012, Ahmad Amireh wrote:

I'm having trouble figuring out how to capture duplicate named subpatterns (as
allowed by the PCRE_DUPNAMES option). My initial understanding from reading
the manual was that I could refer to a subpattern(s) capture using a name
instead of the capture order number. While that is true, it seems the named
subpattern capture always refers to the _last_ branch in which it was
_defined_ and not necessarily matched. The example in the manual is almost
exactly like what I'm trying to do so I'll just use it to explain:

(?J)(?<DN>Mon|Fri|Sun)(?:day)?|(?<DN>Tue)(?:sday)?|(?<DN>Wed)(?:nesday)?|(?<DN>Thu)(?:rsday)?|(?<DN>Sat)(?:urday)?
This example works for me when I test it using the pcretest program:

PCRE version 8.31 2012-07-06

/(?J)(?<DN>Mon|Fri|Sun)(?:day)?|(?<DN>Tue)(?:sday)?|(?<DN>Wed)(?:nesday)?|(?<DN>Thu)(?:rsday)?|(?<DN>Sat)(?:urday)?/
   Monday\CDN
  0: Monday
  1: Mon
   C Mon (3) DN
   Tuesday\CDN
  0: Tuesday
  1: <unset>
  2: Tue
   C Tue (3) DN
   Wednesday\CDN
  0: Wednesday
  1: <unset>
  2: <unset>
  3: Wed
   C Wed (3) DN
   Thursday\CDN
  0: Thursday
  1: <unset>
  2: <unset>
  3: <unset>
  4: Thu
   C Thu (3) DN
   Friday\CDN
  0: Friday
  1: Fri
   C Fri (3) DN
   Saturday\CDN
  0: Saturday
  1: <unset>
  2: <unset>
  3: <unset>
  4: <unset>
  5: Sat
   C Sat (3) DN

The \CDN option on the data lines means "use pcre_copy_named_substring
to collect the value of substring DN after the match". The same test
works with \GDN (using pcre_get_named_substring).

How are you trying to extract the named substring? The two functions
mentioned above return the first substring with the given name that is
actually set.

Philip

Confirmed. It works as expected in pcretest -- PCRE version 8.30 2012-02-04.
The \CDN option on the data lines means "use pcre_copy_named_substring
to collect the value of substring DN after the match". The same test
works with \GDN (using pcre_get_named_substring).

How are you trying to extract the named substring? The two functions
mentioned above return the first substring with the given name that is
actually set.
I did not know about those functions, that's my bad. I'm using the Lua library - lrexlib_pcre <http://rrthomas.github.com/lrexlib/manual.htm>, but unfortunately I don't see the string extraction API exposed to Lua.

Thank you for your help, I appreciate the pointers (I didn't even know about pcretest actually) and I will attempt to expose this functionality to the Lua library and contact its author accordingly.

For those interested, here's a small test that shows the current behaviour of lrexlib-pcre:

    #!/usr/bin/env lua

    require 'rex_pcre' -- available as a rock named lrexlib-pcre

    local ptrn = 
[[(?J)(?<DN>Mon|Fri|Sun)(?:day)?|(?<DN>Tue)(?:sday)?|(?<DN>Wed)(?:nesday)?|(?<DN>Thu)(?:rsday)?|(?<DN>Sat)(?:urday)?]]
    local regex = regex_pcre.new(ptrn)

    if not regex then
      return print("Invalid PCRE regex '" .. ptrn .. "'")
    end

    function test(subject)
      local _,__,captures = regex:exec(subject)
      print("Captures from '" .. subject .. "':")
      for k,v in pairs(captures or {}) do
        if type(k) ~= "number" and v then print("  " .. k .. " => " .. v) end
      end

      return test
    end

    test("Sunday")("Saturday")

Ahmad

--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Reply via email to