Re: How to get word offset all instances of a string in a chunk of text?

2018-08-31 Thread Richard Gaskin via use-livecode
Mike Kerner wrote: > Since the topic of processes came up a few weeks ago I've been > thinking about what it would take to build a process/threading > framework. I wonder if a text processing subprocessor, written > and copiled... I haven't yet come across good use cases for the desktop, but

Re: How to get word offset all instances of a string in a chunk of text?

2018-08-31 Thread Mike Kerner via use-livecode
Since the topic of processes came up a few weeks ago I've been thinking about what it would take to build a process/threading framework. I wonder if a text processing subprocessor, written and copiled in 6 would be worth everyone's time. The main app would hand off the data and the command to

Re: How to get word offset all instances of a string in a chunk of text?

2018-08-31 Thread Keith Clarke via use-livecode
Thanks Alex, HH & Jim for all the help & ideas. Just to close out the thread with a solution for future reference, the code below now extracts from a text source a list of unique words, cleaned up against a noise-word list, with word frequency, word & and a comma-delimited string of the word

Re: How to get word offset all instances of a string in a chunk of text?

2018-08-30 Thread Curry Kenworthy via use-livecode
Jim: > This just doesn’t work in all cases That's the key though, don't repeat when it's not necessary! A day with no repeats is an efficient day. ;) Best wishes, Curry Kenworthy Custom Software Development LiveCode Training and Consulting http://livecodeconsulting.com/

Re: How to get word offset all instances of a string in a chunk of text?

2018-08-30 Thread Jim Lambert via use-livecode
> I wrote: > > Then there is also this repeat-less approach using arrays and filter: > function findWordOffsets pText, pSearchTerm > put replaceText(pText,"\W+"," ") into pText > split pText by space > combine pText with cr and tab > filter pText with "*" & tab &

Re: How to get word offset all instances of a string in a chunk of text?

2018-08-30 Thread Jim Lambert via use-livecode
> On 30/08/2018 10:24, Keith Clarke via use-livecode wrote: >> Folks, >> Is there a single-pass mechanism or more efficient way of returning the >> wordOffset of each instance of ?the? in ?the quick brown fox jumped over the >> lazy dog? than to use two passes through the text? Then there is

Re: How to get word offset all instances of a string in a chunk of text?

2018-08-30 Thread Curry Kenworthy via use-livecode
hh: > Sadly LC 9 is at about 10 times slower > than LC 6 with such fast scripts. Yes, I've been doing some benchmarks and LC 9 usually takes anywhere from 2x to 8x as long to perform a job. With or without text being involved. It is a serious problem that should not be neglected across

Re: How to get word offset all instances of a string in a chunk of text?

2018-08-30 Thread hh via use-livecode
> Alex T. wrote: > > put 0 into tOffset > repeat for each trueWord W in tSource >add 1 to tOffset >if W = myWord then > put tOffset & comma after tOffsetList >end if > end repeat This is (whether trueWord or word chunks used) probably the fastest method for an offset counting

Re: How to get word offset all instances of a string in a chunk of text?

2018-08-30 Thread hh via use-livecode
For a more general context see http://www.runrev.com/pipermail/use-livecode//2004-February/032280.html Sadly LC 9 is at about 10 times slower than LC 6 with such fast scripts. For example LC 6.7.11 needs at about 500 ms to evaluate a 1 MByte string, LC 9.0.0 needs at about 5 seconds.

Re: How to get word offset all instances of a string in a chunk of text?

2018-08-30 Thread Alex Tweedly via use-livecode
OK, this time I'm just typing into email - havent tested these suggestions :-) On 30/08/2018 10:24, Keith Clarke via use-livecode wrote: Folks, Is there a single-pass mechanism or more efficient way of returning the wordOffset of each instance of ‘the’ in ‘the quick brown fox jumped over the