For what it's worth, here was my implementation:
tokenize=:4 :0
'ESC SEP'=. x
E=. 18 b./\.&.|.ESC=y NB. escape positions
S=. (SEP=y)>_1}.0,E NB. separator positions
K=. -.E+.S NB. keep positions
T=. (#y){. 1,}.S NB. token beginnings
(T<;.1 K)#&.>T<;.1 y
)
'^|' tokenize 'one^|uno||three^^^^|four^^^|^cuatro|'
+-------++-------+------------++
|one|uno||three^^|four^|cuatro||
+-------++-------+------------++
Thanks,
--
Raul
On Tue, Dec 30, 2014 at 10:17 AM, Joe Bogner <[email protected]> wrote:
> A similar question on how to tokenize characters with a escape
> character came up in the #jsoftware irc channel recently.
>
> I extended that solution to solve the rest of it. I'm not sure if it's
> possible to use a single sequential machine for it
>
> charTokens =: (0;(3 2 2$(2 1 1 1 2 2 1 2 1 0 1 0));<<'^')&;:
> splitTokens =: ((<,'|')&= <;._1 ])@:((<,'|'),])
> removeExtra =: (}.^:(1<#)) L:0
> tokenize=: ; each @: (removeExtra @: splitTokens @: charTokens)
>
> t=: 'one^|uno||three^^^^|four^^^|^cuatro|'
>
> tokenize t
> +-------++-------+------------++
> |one|uno||three^^|four^|cuatro||
> +-------++-------+------------++
>
> $ tokenize t
> 5
>
>
>
> On Mon, Dec 29, 2014 at 10:28 PM, David Lambert <[email protected]>
> wrote:
>> ^ escapes to the next character,
>> | separates tokens.
>>
>> Can tokenize be written as an application of sequential machine?
>>
>> tokenize 'one^|uno||three^^^^|four^^^|^cuatro|'
>> +-------++-------+------------++
>> |one|uno||three^^|four^|cuatro||
>> +-------++-------+------------++
>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm