More later... Or: lwc2=: 13 :'(<"0 wc y),.~.;:y'
-----Original Message----- From: Programming [mailto:[email protected]] On Behalf Of Skip Cave Sent: Thursday, January 18, 2018 3:24 AM To: [email protected] Subject: [Jprogramming] Word Count I'm working on some Natural Language Processing algorithms. I built a basic set of word count verbs: NB. Test phrase : b =. 'the cat in the hat ate a hat and saw another cat in a hat in the hat' NB. Word count wc =.3 :'#/.~;:y' NB. Labeled word count lwc =.3 :'|:(;/#/.~;:y),.~.;:y' NB. Sorted & l abeled word count slwc =.3 :' (\:wc y){"1 lwc y' slwc b ┌───┬───┬──┬───┬─┬───┬───┬───┬───────┐ │4 │ 3 │3 │2 │2│1 │1 │1 │1 │ ├───┼───┼──┼───┼─┼───┼───┼───┼───────┤ │hat│the│in│cat│a│ate│and│saw│another│ └───┴───┴──┴───┴─┴───┴───┴───┴───────┘ Now I want to do the same thing for 2-word sequences (phrases) with a sliding window: |the cat|cat in|in the|the hat| .... etc. with wrap around the end: |the hat|hat the|the cat| .... etc. And 3-word sequences: |the cat in|cat in the|in the hat|.... etc. with wrap around the end: |in the hat|the hat the|hat the cat| ... etc And 4-word sequences, ... etc. Ideally, I would like a generalized phrase-count verb with the format: NB. Phrase count verb format: NB. x pc y NB. x= number of words in the phrase to be counted NB. y= the text to be processed The output layout should be the same for all n-sequence counts - a 2-row sorted list of the boxed counts, on top of the associated boxed word sequence. Skip Skip Cave Cave Consulting LLC ---------------------------------------------------------------------- For information about J forums see https://nam03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jsoftware.com%2Fforums.htm&data=02%7C01%7C%7C754531cfbbb04bfecb5808d55e4cef8d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636518606878162649&sdata=sMJU51mMnXya%2Fs%2FWLAq%2Fu1GDwy3bE6MO196YL7JHF1I%3D&reserved=0 ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
