A similar question on how to tokenize characters with a escape
character came up in the #jsoftware irc channel recently.

I extended that solution to solve the rest of it. I'm not sure if it's
possible to use a single sequential machine for it

charTokens =: (0;(3 2 2$(2 1 1 1 2 2 1 2 1 0 1 0));<<'^')&;:
splitTokens =: ((<,'|')&= <;._1 ])@:((<,'|'),])
removeExtra =: (}.^:(1<#)) L:0
tokenize=: ; each @: (removeExtra @: splitTokens @: charTokens)

t=: 'one^|uno||three^^^^|four^^^|^cuatro|'

  tokenize t
+-------++-------+------------++
|one|uno||three^^|four^|cuatro||
+-------++-------+------------++

$   tokenize t
5



On Mon, Dec 29, 2014 at 10:28 PM, David Lambert <[email protected]> wrote:
> ^ escapes to the next character,
> | separates tokens.
>
> Can tokenize be written as an application of sequential machine?
>
>    tokenize 'one^|uno||three^^^^|four^^^|^cuatro|'
> +-------++-------+------------++
> |one|uno||three^^|four^|cuatro||
> +-------++-------+------------++
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to