[Jprogramming] Regexp with backreferences in J

Skip Cave Mon, 21 Apr 2008 06:46:17 -0700

Copied from comp/lang.apl
Posted by: [EMAIL PROTECTED]


Brian posted this in the comp.lang.apl newsgroup, and I thought it best
be posted in the J programming forum.

Skip

The rxrplc function in the standard J library script regex.ijs doesn't
seem to handle backreferences in the replacement text.  This function,
derived from rxrplc, seems
to work:

NB.
----------------------------------------------------------------------------
NB. rxsubst - Regular expression substitution
NB.
NB. str =. (pattern[;index];newtext) rxsubst str
NB.
NB. Brian B. McGuinness     J     April, 2008
NB.
----------------------------------------------------------------------------
rxsubst =: 4 : 0
  pat=. >{.x
  new=. >{:x
  if. L. pat do. 'pat ndx'=. pat else. ndx=. ,0 end.
  if. 1 ~: #$ ndx do. 13!:8[3 end.

  mat=. pat rxmatches y

  if. 0 = # mat do.
    y
    return.
  end.

  NB. --- Find any back references in the replacement text
  escaped =. 0
  newtxt  =. 0 2 $ 0
  last    =. 0

  for_i. i. # new do.
    c =. i { new
    if. c = '\' do.
      escaped =. -. escaped
    else.
      if. escaped do.
        c =. '123456789' i. c
        if. c < 9 do.
          newtxt =. newtxt, (last }. (i - 1) {. new); c
          last   =. i + 1
        end.
      end.
      escaped =. 0
    end.
  end.

  if. last < #new do.
    newtxt =. newtxt, (last }. new); 9
  end.

  if. 1 = #newtxt do.
    NB. --- No back references in the replacement text; act like
rxrplc
    (0 { newtxt) (({.ndx) {"2 mat) rxmerge y
  else.
    NB. --- Expand the back references
    repl =. ''

    for_i. mat do.
      a =. ''
      for_j. }. i do.
        a =. a, < (1 { j) {. (0 { j) }. y
      end.
      a =. ;((1 {::"1 newtxt) { 10 {. a) (< a: ; 1) } newtxt

      repl =. repl, <a
    end.

    NB. --- Now perform the replacements
    repl (({.ndx) {"2 mat) rxmerge y
  end.
)

For example:

   ('@([biu]){(.*?)}';'<\1>\2</\1>') rxsubst 'We can have @b{bold},
@i{italic}, and @u{underlined} text.'

We can have <b>bold</b>, <i>italic</i>, and <u>underlined</u> text.


I have experimented a bit with the J state machine, for example:

NB.
----------------------------------------------------------------------------
NB. Find escaped digits
NB.
NB. State 0: normal
NB. State 1: escaped
NB. State 2: escaped digit found (used so we break *after* the digit)
NB. State 3: initial state (needed since we're not allowed to
initialize j to 0)
NB.
----------------------------------------------------------------------------

test4 =: 3 : 0
  conv   =. (a. = '\') + +: a. e. '123456789'
  trans  =. 4 3 2 $  0 0  1 0  0 0   0 0  0 0  2 0   0 2  0 2  0 2   0
1  1 1  0 1
  newtxt =. (0;trans;conv; 0 _1 3 _1) ;: y
  field  =. (<"0 '123456789' i. {: & > }: newtxt), <9
  newtxt =. ((_2 }. &. > }: newtxt), {: newtxt) ,. field
  NB. *** for each match, replace 2nd col of newtxt with corresponding
substr, then raze
)

But I don't see any great advantage to using it, and it makes the code
less
readable.  The state machine also has some odd restrictions, such as
requiring that the "j" value be initialized to _1, so a special
"initial" state then
required to set j to 0.

--- Brian


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jprogramming] Regexp with backreferences in J

Reply via email to