[Jprogramming] Word Count

Skip Cave Thu, 18 Jan 2018 00:25:05 -0800

I'm working on some Natural Language Processing algorithms.

I built
a
basic
set of 
word count verbs:



 
NB. Test phrase
:


 
b =. 'the cat in the hat ate a hat and saw another cat in a hat in the hat'


 
NB. Word count

 
 wc =.3 :'#/.~;:y'


 
NB. Labeled word count

 
 lwc =.3 :'|:(;/#/.~;:y),.~.;:y'

    
NB. Sorted &
l
abeled word count
    
slwc =.3 :' (\:wc y){"1 lwc y'
    
slwc b
┌───┬───┬──┬───┬─┬───┬───┬───┬───────┐
│4
  
│

3

 │3 │2
 
│2│1
  
│1
  
│1
  
│1
        
│
├───┼───┼──┼───┼─┼───┼───┼───┼───────┤
│hat│the│in│cat│a│ate│and│saw│another│
└───┴───┴──┴───┴─┴───┴───┴───┴───────┘

Now I want to do the same thing for 2-word sequences (phrases) with a
sliding window:
|the cat|cat in|in the|the hat| .... etc.
with wrap around the end:
|the hat|hat the|the cat| .... etc.

And 3-word sequences:
|the cat in|cat in the|in the hat|.... etc.
with wrap around the end:
|in the hat|the hat the|hat the cat| ... etc

And 4-word sequences, ... etc.

Ideally, I would like a generalized phrase-count verb with the format:

NB. Phrase count verb format:
NB.  x pc y
NB.  x= number of words in the phrase to be counted
NB.  y= the text to be processed

The output layout should be the same for all n-sequence counts - a 2-row
sorted list of the boxed counts, on top of the associated boxed word
sequence.

Skip

Skip Cave
Cave Consulting LLC
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jprogramming] Word Count

Reply via email to