A similar question on how to tokenize characters with a escape character came up in the #jsoftware irc channel recently.
I extended that solution to solve the rest of it. I'm not sure if it's possible to use a single sequential machine for it charTokens =: (0;(3 2 2$(2 1 1 1 2 2 1 2 1 0 1 0));<<'^')&;: splitTokens =: ((<,'|')&= <;._1 ])@:((<,'|'),]) removeExtra =: (}.^:(1<#)) L:0 tokenize=: ; each @: (removeExtra @: splitTokens @: charTokens) t=: 'one^|uno||three^^^^|four^^^|^cuatro|' tokenize t +-------++-------+------------++ |one|uno||three^^|four^|cuatro|| +-------++-------+------------++ $ tokenize t 5 On Mon, Dec 29, 2014 at 10:28 PM, David Lambert <[email protected]> wrote: > ^ escapes to the next character, > | separates tokens. > > Can tokenize be written as an application of sequential machine? > > tokenize 'one^|uno||three^^^^|four^^^|^cuatro|' > +-------++-------+------------++ > |one|uno||three^^|four^|cuatro|| > +-------++-------+------------++ > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
