Re: Finding common words and phrases in a large block of text?

2018-10-25 Thread Tom Glod via use-livecode
Hi Terry, I see, thanks for sharing your handler.  I'm going to run it on
some text and see the output.  LC is sooo good with chunks.I find it
really fast as well.

All the best, Tom



On Thu, Oct 25, 2018 at 5:07 PM, Terry Judd via use-livecode <
use-livecode@lists.runrev.com> wrote:

> On 26/10/2018 4:27 am, "use-livecode on behalf of Tom Glod via
> use-livecode"  use-livecode@lists.runrev.com> wrote:
>
> Hi Terry, glad you found a solution.
>
> I have a similar challenge.
>
> I did a word count, but would love to recognize the same phrases.  Did
> you
> just compare chunks? ... hash them? (probably redundant?)
>
> Are there any more hints you can drop about this?
>
> Thanks,
>
> Tom
>
> Hi Tom - I've just done something like the code below, which accepts a
> block of text and the maximum 'phrase' length as input and provides an
> array with sorted counts of word runs (so not necessarily sensible phrases)
> of different lengths as output. I think it will be good enough for my
> purposes.
>
> function getWordAndPhraseCounts pText, pMaxPhraseLength
>put empty into tA1
>set the itemDel to tab
>repeat for each sentence tSentence in pText
>   put the number of words in tSentence into tMax
>   repeat with i = 1 to pMaxPhraseLength
>  repeat with j = 1 to (tMax-i+1)
> put word j to j+i-1 of tSentence into tPhrase
> add 1 to tA1[i][tPhrase]
>  end repeat
>   end repeat
>end repeat
>put empty into tA2
>repeat for each line tLength in the keys of tA1
>   put empty into tList
>   repeat for each line tPhrase in the keys of tA1[tLength]
>  put tPhrase& tA1[tLength][tPhrase] after tList
>   end repeat
>   delete last char of tList
>   sort lines of tList descending numeric by item 2 of each
>   put tList into tA2[tLength]
>end repeat
>return tA2
> end getWordAndPhraseCounts
>
>
> On Thu, Oct 25, 2018 at 4:27 AM Terry Judd via use-livecode <
> use-livecode@lists.runrev.com> wrote:
>
> > OK - was easier than I thought. I have something that works fast
> enough by
> > iterating through runs of words in each sentence in a block of text,
> > incrementing counts into an array and then sorting the contents of
> that
> > array by phrase length and frequency.
> >
> > Terry...
> >
> > On 25/10/2018 4:56 pm, "use-livecode on behalf of Terry Judd via
> > use-livecode"  > use-livecode@lists.runrev.com> wrote:
> >
> > Hi – I’m looking to analyse some large block of text (journal
> > abstracts from key educational technology journals over a several
> year
> > period) to find common words and phrases. Finding common words
> should be
> > easy enough but I’m not clear on what approach to take for finding
> common
> > phrases (iterating through the text capturing overlapping word runs
> of
> > various lengths?). Any ideas on how best to proceed?
> >
> > TIA,
> >
> > Terry...
> > ___
> > use-livecode mailing list
> > use-livecode@lists.runrev.com
> > Please visit this url to subscribe, unsubscribe and manage your
> > subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
> >
> >
> > ___
> > use-livecode mailing list
> > use-livecode@lists.runrev.com
> > Please visit this url to subscribe, unsubscribe and manage your
> > subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Finding common words and phrases in a large block of text?

2018-10-25 Thread Terry Judd via use-livecode
On 26/10/2018 4:27 am, "use-livecode on behalf of Tom Glod via use-livecode" 
 wrote:

Hi Terry, glad you found a solution.

I have a similar challenge.

I did a word count, but would love to recognize the same phrases.  Did you
just compare chunks? ... hash them? (probably redundant?)

Are there any more hints you can drop about this?

Thanks,

Tom

Hi Tom - I've just done something like the code below, which accepts a block of 
text and the maximum 'phrase' length as input and provides an array with sorted 
counts of word runs (so not necessarily sensible phrases) of different lengths 
as output. I think it will be good enough for my purposes.

function getWordAndPhraseCounts pText, pMaxPhraseLength
   put empty into tA1
   set the itemDel to tab
   repeat for each sentence tSentence in pText
  put the number of words in tSentence into tMax
  repeat with i = 1 to pMaxPhraseLength
 repeat with j = 1 to (tMax-i+1)
put word j to j+i-1 of tSentence into tPhrase
add 1 to tA1[i][tPhrase]
 end repeat
  end repeat
   end repeat
   put empty into tA2
   repeat for each line tLength in the keys of tA1
  put empty into tList
  repeat for each line tPhrase in the keys of tA1[tLength]
 put tPhrase& tA1[tLength][tPhrase] after tList
  end repeat
  delete last char of tList
  sort lines of tList descending numeric by item 2 of each
  put tList into tA2[tLength]
   end repeat
   return tA2
end getWordAndPhraseCounts


On Thu, Oct 25, 2018 at 4:27 AM Terry Judd via use-livecode <
use-livecode@lists.runrev.com> wrote:

> OK - was easier than I thought. I have something that works fast enough by
> iterating through runs of words in each sentence in a block of text,
> incrementing counts into an array and then sorting the contents of that
> array by phrase length and frequency.
>
> Terry...
>
> On 25/10/2018 4:56 pm, "use-livecode on behalf of Terry Judd via
> use-livecode"  use-livecode@lists.runrev.com> wrote:
>
> Hi – I’m looking to analyse some large block of text (journal
> abstracts from key educational technology journals over a several year
> period) to find common words and phrases. Finding common words should be
> easy enough but I’m not clear on what approach to take for finding common
> phrases (iterating through the text capturing overlapping word runs of
> various lengths?). Any ideas on how best to proceed?
>
> TIA,
>
> Terry...
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your 
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Finding common words and phrases in a large block of text?

2018-10-25 Thread Tom Glod via use-livecode
Hi Terry, glad you found a solution.

I have a similar challenge.

I did a word count, but would love to recognize the same phrases.  Did you
just compare chunks? ... hash them? (probably redundant?)

Are there any more hints you can drop about this?

Thanks,

Tom

On Thu, Oct 25, 2018 at 4:27 AM Terry Judd via use-livecode <
use-livecode@lists.runrev.com> wrote:

> OK - was easier than I thought. I have something that works fast enough by
> iterating through runs of words in each sentence in a block of text,
> incrementing counts into an array and then sorting the contents of that
> array by phrase length and frequency.
>
> Terry...
>
> On 25/10/2018 4:56 pm, "use-livecode on behalf of Terry Judd via
> use-livecode"  use-livecode@lists.runrev.com> wrote:
>
> Hi – I’m looking to analyse some large block of text (journal
> abstracts from key educational technology journals over a several year
> period) to find common words and phrases. Finding common words should be
> easy enough but I’m not clear on what approach to take for finding common
> phrases (iterating through the text capturing overlapping word runs of
> various lengths?). Any ideas on how best to proceed?
>
> TIA,
>
> Terry...
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Finding common words and phrases in a large block of text?

2018-10-25 Thread Terry Judd via use-livecode
OK - was easier than I thought. I have something that works fast enough by 
iterating through runs of words in each sentence in a block of text, 
incrementing counts into an array and then sorting the contents of that array 
by phrase length and frequency.

Terry...

On 25/10/2018 4:56 pm, "use-livecode on behalf of Terry Judd via use-livecode" 
 wrote:

Hi – I’m looking to analyse some large block of text (journal abstracts 
from key educational technology journals over a several year period) to find 
common words and phrases. Finding common words should be easy enough but I’m 
not clear on what approach to take for finding common phrases (iterating 
through the text capturing overlapping word runs of various lengths?). Any 
ideas on how best to proceed?

TIA,

Terry...
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your 
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Finding common words and phrases in a large block of text?

2018-10-24 Thread Terry Judd via use-livecode
Hi – I’m looking to analyse some large block of text (journal abstracts from 
key educational technology journals over a several year period) to find common 
words and phrases. Finding common words should be easy enough but I’m not clear 
on what approach to take for finding common phrases (iterating through the text 
capturing overlapping word runs of various lengths?). Any ideas on how best to 
proceed?

TIA,

Terry...
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode