Copied from comp/lang.apl
Posted by: [EMAIL PROTECTED]
Brian posted this in the comp.lang.apl newsgroup, and I thought it best
be posted in the J programming forum.
Skip
The rxrplc function in the standard J library script regex.ijs doesn't
seem to handle backreferences in the replacement text. This function,
derived from rxrplc, seems
to work:
NB.
----------------------------------------------------------------------------
NB. rxsubst - Regular expression substitution
NB.
NB. str =. (pattern[;index];newtext) rxsubst str
NB.
NB. Brian B. McGuinness J April, 2008
NB.
----------------------------------------------------------------------------
rxsubst =: 4 : 0
pat=. >{.x
new=. >{:x
if. L. pat do. 'pat ndx'=. pat else. ndx=. ,0 end.
if. 1 ~: #$ ndx do. 13!:8[3 end.
mat=. pat rxmatches y
if. 0 = # mat do.
y
return.
end.
NB. --- Find any back references in the replacement text
escaped =. 0
newtxt =. 0 2 $ 0
last =. 0
for_i. i. # new do.
c =. i { new
if. c = '\' do.
escaped =. -. escaped
else.
if. escaped do.
c =. '123456789' i. c
if. c < 9 do.
newtxt =. newtxt, (last }. (i - 1) {. new); c
last =. i + 1
end.
end.
escaped =. 0
end.
end.
if. last < #new do.
newtxt =. newtxt, (last }. new); 9
end.
if. 1 = #newtxt do.
NB. --- No back references in the replacement text; act like
rxrplc
(0 { newtxt) (({.ndx) {"2 mat) rxmerge y
else.
NB. --- Expand the back references
repl =. ''
for_i. mat do.
a =. ''
for_j. }. i do.
a =. a, < (1 { j) {. (0 { j) }. y
end.
a =. ;((1 {::"1 newtxt) { 10 {. a) (< a: ; 1) } newtxt
repl =. repl, <a
end.
NB. --- Now perform the replacements
repl (({.ndx) {"2 mat) rxmerge y
end.
)
For example:
('@([biu]){(.*?)}';'<\1>\2</\1>') rxsubst 'We can have @b{bold},
@i{italic}, and @u{underlined} text.'
We can have <b>bold</b>, <i>italic</i>, and <u>underlined</u> text.
I have experimented a bit with the J state machine, for example:
NB.
----------------------------------------------------------------------------
NB. Find escaped digits
NB.
NB. State 0: normal
NB. State 1: escaped
NB. State 2: escaped digit found (used so we break *after* the digit)
NB. State 3: initial state (needed since we're not allowed to
initialize j to 0)
NB.
----------------------------------------------------------------------------
test4 =: 3 : 0
conv =. (a. = '\') + +: a. e. '123456789'
trans =. 4 3 2 $ 0 0 1 0 0 0 0 0 0 0 2 0 0 2 0 2 0 2 0
1 1 1 0 1
newtxt =. (0;trans;conv; 0 _1 3 _1) ;: y
field =. (<"0 '123456789' i. {: & > }: newtxt), <9
newtxt =. ((_2 }. &. > }: newtxt), {: newtxt) ,. field
NB. *** for each match, replace 2nd col of newtxt with corresponding
substr, then raze
)
But I don't see any great advantage to using it, and it makes the code
less
readable. The state machine also has some odd restrictions, such as
requiring that the "j" value be initialized to _1, so a special
"initial" state then
required to set j to 0.
--- Brian
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm