More later...

Or:  lwc2=: 13 :'(<"0 wc y),.~.;:y'

-----Original Message-----
From: Programming [mailto:[email protected]] On Behalf 
Of Skip Cave
Sent: Thursday, January 18, 2018 3:24 AM
To: [email protected]
Subject: [Jprogramming] Word Count

I'm working on some Natural Language Processing algorithms.

I built
​a
basic
​set of ​
word count verbs:


​ ​
NB. Test phrase
​:​


​ ​
b =. 'the cat in the hat ate a hat and saw another cat in a hat in the hat'


​ ​
NB. Word count

​ ​
 wc =.3 :'#/.~;:y'


​ ​
NB. Labeled word count

​ ​
 lwc =.3 :'|:(;/#/.~;:y),.~.;:y'

​    ​
NB. Sorted &
​l​
abeled word count
​    ​
slwc =.3 :' (\:wc y){"1 lwc y'
​    ​
slwc b
┌───┬───┬──┬───┬─┬───┬───┬───┬───────┐
│4
​  ​
│
​​
3
​
 │3 │2
​ ​
│2│1
​  ​
│1
​  ​
│1
​  ​
│1
​        ​
│
├───┼───┼──┼───┼─┼───┼───┼───┼───────┤
│hat│the│in│cat│a│ate│and│saw│another│
└───┴───┴──┴───┴─┴───┴───┴───┴───────┘

​Now I want to do the same thing for 2-word sequences (phrases) with a sliding 
window:
|the cat|cat in|in the|the hat| .... etc.
with wrap around the end:
|the hat|hat the|the cat| .... etc.

And 3-wo​rd sequences:
​|the cat in|cat in the|in the hat|.... etc.
with wrap around the end:
|in the hat|the hat the|hat the cat| ... etc

​And 4-word sequences, ... etc.​

Ideally, I would like a generalized phrase-count verb with the format:

NB. Phrase count verb format:
NB.  x pc y
NB.  x= number of words in the phrase to be counted NB.  y= the text to be 
processed

The output layout should be the same for all n-sequence counts - a 2-row sorted 
list of the boxed counts, on top of the associated boxed word sequence.

Skip

Skip Cave
Cave Consulting LLC
----------------------------------------------------------------------
For information about J forums see 
https://nam03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jsoftware.com%2Fforums.htm&data=02%7C01%7C%7C754531cfbbb04bfecb5808d55e4cef8d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636518606878162649&sdata=sMJU51mMnXya%2Fs%2FWLAq%2Fu1GDwy3bE6MO196YL7JHF1I%3D&reserved=0
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to