Action codes 4 and 5 aren't specifically what I think you would want
in this context. What they allow you to do is tentatively emit a token
and then return to that state to potentially extend the length of that
token. In other words, action codes 4 (and 5) simply give you a longer
word which contains all of the intervening text.

Meanwhile, ;: will give you only one level of boxing.

That said, we can implement "separate on two adjacent line feeds" and
then tokenize as you have implemented within the resulting sequences.

For example:

   (<0;S1;M;0 _1 0 _1) ;:&.> (0;S0;M;0 _1 0 _1) ;: test1
┌────────────────┬───────┬───────┐
│┌────┬────┬────┐│┌─────┐│┌──┬──┐│
││1000│2000│3000│││11111│││11│22││
│└────┴────┴────┘│└─────┘│└──┴──┘│
└────────────────┴───────┴───────┘

Or, if you prefer

   (0;S1;M;0 _1 0 _1) ;:L:1 0 (0;S0;M;0 _1 0 _1) ;: test1
┌────────────────┬───────┬───────┐
│┌────┬────┬────┐│┌─────┐│┌──┬──┐│
││1000│2000│3000│││11111│││11│22││
│└────┴────┴────┘│└─────┘│└──┴──┘│
└────────────────┴───────┴───────┘

Given:

test1=: {{)n
1000ddd
  2000
ab3000

11111xxx

   11
 22
}}

M=: (a.=LF)+2*a.e.'0123456789'

S0=: +.".>cutLF {{)n
  1j1 2j1 1j1  NB. start here
  1j0 2j0 1j0  NB. non-newline
  1j0 1j2 1j0  NB. newline
}}

S1=: +.".>cutLF {{)n
  0j0 1j0 2j1  NB. start here (generic text)
  0j0 0j0 2j1  NB. linefeed
  0j3 1j3 2j0  NB. digit
}}

Hypothetically, we could also implement some similar approach, perhaps
ignoring whitespace between linefeeds. But I don't see a need for
action code 4 (or 5) there. M would need an additional character class
for whitespace, but there's never any ambiguity here about where to
end a "token".

I hope this makes sense,

-- 
Raul

On Thu, Jan 12, 2023 at 9:45 AM Pawel Jakubas <jakubas.pa...@gmail.com> wrote:
>
> Dear J enthusiasts,
>
> I try to learn state machine parsing and try to use 4 or 5 action code in
> practice.
> Here is an example to be concrete.
> 1. So we have the file:
> $ cat test1.txt
> 1000ddd
>   2000
> ab3000
>
> 11111xxx
>
>    11
>  22
>
> 2. now I want to ideally have three boxes, one containing 1000, 2000, 3000,
> the second containing 11111 and the last one containing 11, 22.
>
> 3. So I create m where 1 is LF, 2 are digits, and 0 is rest
>    m =. a. e. LF
>    m=: m + 2* a. e. '0123456789'
>    m
> 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> 4. Now I start with defining s (firstly with undesired result but still)
>    s =. 1 3 2 $ 0 0  1 0  2 1
>    s =. s , 3 2 $ 0 0  0 0  2 1
>    s =. s , 3 2 $ 0 3  1 3  2 0
>
> 5. Running the machine gives:
> d=: fread jpath './test1.txt'
>    (0;s;m;0 _1 0 _1) ;: d
> ┌────┬────┬────┬─────┬──┬──┐
> │1000│2000│3000│11111│11│22│
> └────┴────┴────┴─────┴──┴──┘
>
> which is as expected basing on s.
>
> Now I would like to have parser that produces something like that:
> ┌────────────────┬───────┬───────┐
> │┌────┬────┬────┐│┌─────┐│┌──┬──┐│
> ││1000│2000│3000│││11111│││11│22││
> │└────┴────┴────┘│└─────┘│└──┴──┘│
> └────────────────┴───────┴───────┘
> Can I use 4/5 action code in s to achieve that in one state machine
> definition? Basically the moment that separates numbers is (row) LF ->
> (column) LF.
>
> Thanks !
> Pawel
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to