Action codes 4 and 5 aren't specifically what I think you would want in this context. What they allow you to do is tentatively emit a token and then return to that state to potentially extend the length of that token. In other words, action codes 4 (and 5) simply give you a longer word which contains all of the intervening text.
Meanwhile, ;: will give you only one level of boxing. That said, we can implement "separate on two adjacent line feeds" and then tokenize as you have implemented within the resulting sequences. For example: (<0;S1;M;0 _1 0 _1) ;:&.> (0;S0;M;0 _1 0 _1) ;: test1 ┌────────────────┬───────┬───────┐ │┌────┬────┬────┐│┌─────┐│┌──┬──┐│ ││1000│2000│3000│││11111│││11│22││ │└────┴────┴────┘│└─────┘│└──┴──┘│ └────────────────┴───────┴───────┘ Or, if you prefer (0;S1;M;0 _1 0 _1) ;:L:1 0 (0;S0;M;0 _1 0 _1) ;: test1 ┌────────────────┬───────┬───────┐ │┌────┬────┬────┐│┌─────┐│┌──┬──┐│ ││1000│2000│3000│││11111│││11│22││ │└────┴────┴────┘│└─────┘│└──┴──┘│ └────────────────┴───────┴───────┘ Given: test1=: {{)n 1000ddd 2000 ab3000 11111xxx 11 22 }} M=: (a.=LF)+2*a.e.'0123456789' S0=: +.".>cutLF {{)n 1j1 2j1 1j1 NB. start here 1j0 2j0 1j0 NB. non-newline 1j0 1j2 1j0 NB. newline }} S1=: +.".>cutLF {{)n 0j0 1j0 2j1 NB. start here (generic text) 0j0 0j0 2j1 NB. linefeed 0j3 1j3 2j0 NB. digit }} Hypothetically, we could also implement some similar approach, perhaps ignoring whitespace between linefeeds. But I don't see a need for action code 4 (or 5) there. M would need an additional character class for whitespace, but there's never any ambiguity here about where to end a "token". I hope this makes sense, -- Raul On Thu, Jan 12, 2023 at 9:45 AM Pawel Jakubas <jakubas.pa...@gmail.com> wrote: > > Dear J enthusiasts, > > I try to learn state machine parsing and try to use 4 or 5 action code in > practice. > Here is an example to be concrete. > 1. So we have the file: > $ cat test1.txt > 1000ddd > 2000 > ab3000 > > 11111xxx > > 11 > 22 > > 2. now I want to ideally have three boxes, one containing 1000, 2000, 3000, > the second containing 11111 and the last one containing 11, 22. > > 3. So I create m where 1 is LF, 2 are digits, and 0 is rest > m =. a. e. LF > m=: m + 2* a. e. '0123456789' > m > 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 4. Now I start with defining s (firstly with undesired result but still) > s =. 1 3 2 $ 0 0 1 0 2 1 > s =. s , 3 2 $ 0 0 0 0 2 1 > s =. s , 3 2 $ 0 3 1 3 2 0 > > 5. Running the machine gives: > d=: fread jpath './test1.txt' > (0;s;m;0 _1 0 _1) ;: d > ┌────┬────┬────┬─────┬──┬──┐ > │1000│2000│3000│11111│11│22│ > └────┴────┴────┴─────┴──┴──┘ > > which is as expected basing on s. > > Now I would like to have parser that produces something like that: > ┌────────────────┬───────┬───────┐ > │┌────┬────┬────┐│┌─────┐│┌──┬──┐│ > ││1000│2000│3000│││11111│││11│22││ > │└────┴────┴────┘│└─────┘│└──┴──┘│ > └────────────────┴───────┴───────┘ > Can I use 4/5 action code in s to achieve that in one state machine > definition? Basically the moment that separates numbers is (row) LF -> > (column) LF. > > Thanks ! > Pawel > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm