Re: [Jprogramming] Word Count

Raul Miller Thu, 18 Jan 2018 08:09:31 -0800

Ah, yes, I missed the bit about wrapping. I was in a hurry to get out
the door and glossed over that part.


Still, that's simple to add:

words=: ;: NB. might change this because punctuation handling

pc=: 4 :0
  w=. x <@(;:inv)\ ($~ _1+x+#) words y
  n=. #/.~ w
  o=. \: n
  (<"0 o{n),:o{~.w
)

Not quite the same implementation as your xwrap, but I think I prefer
using reshape for something like this.

Thanks,

-- 
Raul


On Thu, Jan 18, 2018 at 9:56 AM, 'Mike Day' via Programming
<[email protected]> wrote:
> Raul's explicit verb is more readable,  than the following,  but I think
> he's
> overlooked your requirement for word-wrapping.
>
> As I understand that little extra,  one only needs to wrap 1 fewer words
> than the
> group-size.  I chose to wrap them at the end rather than at the front,
> which your
> examples portrayed.
>
> I've assumed ;: is sufficient for the time being.
>
> xwrap =: ([ , ({.~ <:))~    NB. tack on (x-1) words at end of phrase
>
> xgroup=: [ <@:(;:^:_1)\ ]   NB. form x-sized groups
>
> gwc   =: ({"1~ \:@{.)@(~. ,:~ <@#/.~) NB. dec sort numbers and nub of groups
>
> wordcount =: gwc@([ xgroup [ xwrap ;:@]) NB. combine the verbs
>
>
>    5{."1 ]3 wordcount b  NB. 5{. to avoid email word-wrapping!?
> +----------+----------+----------+-----------+---------+
> |2         |1         |1         |1          |1        |
> +----------+----------+----------+-----------+---------+
> |in the hat|the cat in|cat in the|the hat ate|hat ate a|
> +----------+----------+----------+-----------+---------+
>
> Might help a bit further,
>
>
> Mike
>
>
>
>
>
> On 18/01/2018 08:23, Skip Cave wrote:
>>
>> I'm working on some Natural Language Processing algorithms.
>>
>> I built
>> a
>> basic
>> set of
>> word count verbs:
>>
>>
>>
>> NB. Test phrase
>> :
>>
>>
>>
>> b =. 'the cat in the hat ate a hat and saw another cat in a hat in the
>> hat'
>>
>>
>>
>> NB. Word count
>>
>>
>>   wc =.3 :'#/.~;:y'
>>
>>
>>
>> NB. Labeled word count
>>
>>
>>   lwc =.3 :'|:(;/#/.~;:y),.~.;:y'
>>
>>
>> NB. Sorted &
>> l
>> abeled word count
>>
>> slwc =.3 :' (\:wc y){"1 lwc y'
>>
>> slwc b
>> ┌───┬───┬──┬───┬─┬───┬───┬───┬───────┐
>> │4
>>
>> │
>>
>> 3
>>
>>   │3 │2
>>
>> │2│1
>>
>> │1
>>
>> │1
>>
>> │1
>>
>> │
>> ├───┼───┼──┼───┼─┼───┼───┼───┼───────┤
>> │hat│the│in│cat│a│ate│and│saw│another│
>> └───┴───┴──┴───┴─┴───┴───┴───┴───────┘
>>
>> Now I want to do the same thing for 2-word sequences (phrases) with a
>> sliding window:
>> |the cat|cat in|in the|the hat| .... etc.
>> with wrap around the end:
>> |the hat|hat the|the cat| .... etc.
>>
>> And 3-word sequences:
>> |the cat in|cat in the|in the hat|.... etc.
>> with wrap around the end:
>> |in the hat|the hat the|hat the cat| ... etc
>>
>> And 4-word sequences, ... etc.
>>
>> Ideally, I would like a generalized phrase-count verb with the format:
>>
>> NB. Phrase count verb format:
>> NB.  x pc y
>> NB.  x= number of words in the phrase to be counted
>> NB.  y= the text to be processed
>>
>> The output layout should be the same for all n-sequence counts - a 2-row
>> sorted list of the boxed counts, on top of the associated boxed word
>> sequence.
>>
>> Skip
>>
>> Skip Cave
>> Cave Consulting LLC
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
>
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Word Count

Reply via email to