/. gives you letter-tabulations and \ gets you prefixes:

   </.~ 'Hello Jello'
┌─┬──┬────┬──┬─┬─┐
│H│ee│llll│oo│ │J│
└─┴──┴────┴──┴─┴─┘
   (#;{.)/.~ 'Hello Jello'
┌─┬─┐
│1│H│
├─┼─┤
│2│e│
├─┼─┤
│4│l│
├─┼─┤
│2│o│
├─┼─┤
│1│ │
├─┼─┤
│1│J│
└─┴─┘
   0 _1 { <@((#;[),(#;{.)/.~)\ 'Hello Jello'
┌─────┬────────────────┐
│┌─┬─┐│┌──┬───────────┐│
││1│H│││11│Hello Jello││
│├─┼─┤│├──┼───────────┤│
││1│H│││1 │H          ││
│└─┴─┘│├──┼───────────┤│
│     ││2 │e          ││
│     │├──┼───────────┤│
│     ││4 │l          ││
│     │├──┼───────────┤│
│     ││2 │o          ││
│     │├──┼───────────┤│
│     ││1 │           ││
│     │├──┼───────────┤│
│     ││1 │J          ││
│     │└──┴───────────┘│
└─────┴────────────────┘
   $<@((#;[),(#;{.)/.~)\ 'Hello Jello'
11
   ,./(#~ 9 < 0 0&{::"2) ((#;[),(#;{.)/.~)\ 'Hello Jello'
┌──┬──────────┬──┬───────────┐
│10│Hello Jell│11│Hello Jello│
├──┼──────────┼──┼───────────┤
│1 │H         │1 │H          │
├──┼──────────┼──┼───────────┤
│2 │e         │2 │e          │
├──┼──────────┼──┼───────────┤
│4 │l         │4 │l          │
├──┼──────────┼──┼───────────┤
│1 │o         │2 │o          │
├──┼──────────┼──┼───────────┤
│1 │          │1 │           │
├──┼──────────┼──┼───────────┤
│1 │J         │1 │J          │
└──┴──────────┴──┴───────────┘

So I'd start with those, and maybe work with case folding and
26$0 counts of letters or whatever's convenient for the task.

Performance:

   timex '#/.~ 1e7#''h'''
0.0227
   timex '#[\ 1e4#''h'''
0.034662
   timex '#<\ 1e4#''h'''
0.013861
   timex '##\ 1e7#''h'''
0.017643

So you should take care with the u of u\

On 2021-04-09 08:48, Emir U wrote:
s=: 'Hello Jello'

Given a string like the above, I need to tabulate the number of
occurrences of every letter for every prefix of length >=k. I also
need to know the length of the prefix. So:

<prefix length> <prefix> <letter> <count>

As an example, prefix length=3, prefix=ell, letter=o, count=2

In my real use case k may be quite large (say 20) and the string may
be very long. The final form needs to be something I can slice and
dice thereafter (like perhaps a sparse array). I'd be grateful for any
advice as to how to tackle this.

Emir
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to