Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-05 Thread Geoff Canyon via use-livecode
I've updated my GitHub to the following, which adopts Brian's "starts with"
(I can't count how many times I've had to re-remember that "starts with" is
faster than comparing to char 1 through ) and added minor
optimizations to the wrapping-up code.

gc

function allOffsets D,S,pCase,pNoOverlaps
   -- returns a comma-delimited list of the offsets of D in S
   set the caseSensitive to pCase is true
   put length(D) into dLength
   put pNoOverlaps and dLength > 1 into pNoOverlaps
   put numtochar(chartonum(char -1 of D) mod 2 + 1) after S
   if not pNoOverlaps then
  repeat with i = 1 to dLength - 1
 if not (char i + 1 to -1 of D is char 1 to dLength - i of D) then
next repeat
 put char -i to -1 of D into OV[i]
 put i & cr after kList
  end repeat
   end if
   set the itemDel to D
   put 1 - dLength into C
   if pNoOverlaps or kList is empty then
  repeat for each item i in S
 add length(i) + dLength to C
 put C,"" after R
  end repeat
   else
  repeat for each item i in S
 repeat for each line K in kList
if i & D begins with OV[K] then put (C + K),"" after R
 end repeat
 add length(i) + dLength to C
 put C,"" after R
  end repeat
   end if
   set the itemDel to comma
   repeat with i = 1 to 999
  if item i of R > 0 then exit repeat
   end repeat
   delete item 1 to i - 1 of R
   if R begins with C then return 0
   return char 1 to -3 - length(C) of R
end allOffsets
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-05 Thread Geoff Canyon via use-livecode
On Sun, Nov 4, 2018 at 7:11 PM Mark Wieder via use-livecode <
use-livecode@lists.runrev.com> wrote:

>
> If you're looking for 'romeo' in pText, would you set pOverlaps to true
> or to false?


I'd set it to false, there's no way for "romeo" to overlap. But even if I
were looking for "radar", which could overlap, I'd set it to false if I
were searching an english text document, because there's no word
"radaradar". But as I said, I've switched it to default to finding overlaps.
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-05 Thread Geoff Canyon via use-livecode
On Sun, Nov 4, 2018 at 7:42 PM Bob Sneidar via use-livecode <
use-livecode@lists.runrev.com> wrote:

> Simply add 1 to the last offset pointer. If after the first iteration you
> return 1, then set the charsToSkip to 2 instead of offset +
> len(searchString) if you take my meaning.
>
> Bob S
>

The method we're using avoids charsToSkip because it suffers mightily with
multi-byte characters. But the latest updates handle overlapping results,
see other posts in this thread.
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-04 Thread Brian Milby via use-livecode
Here's an image of the stack in my fork of the repo:
https://github.com/bwmilby/alloffsets/blob/bwm/bwm/stack_allOffsets_card_id_1018.png


On Sun, Nov 4, 2018 at 10:07 PM Brian Milby  wrote:

> I’m working on an update to the stack now. Moving buttons to the left side
> to make it easier to add more.
>
> Thanks,
> Brian
> On Nov 4, 2018, 10:02 PM -0600, Mark Wieder via use-livecode <
> use-livecode@lists.runrev.com>, wrote:
>
> On 11/4/18 4:45 PM, Brian Milby via use-livecode wrote:
>
> My updated solution always looks for overlap but if none are found it uses
> optimized versions of the search (private functions instead of inside the
> main function). I special case for no overlap and a single overlap in the
> delimiter. It is about the same speed as Geoff’s.
>
>
> Nice. I tried to get tricky and replace that 'replace with' loop with a
> 'repeat for each' loop, but ended up about 20% slower. Not at all what I
> expected.
>
> --
> Mark Wieder
> ahsoftw...@gmail.com
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-04 Thread Brian Milby via use-livecode
I’m working on an update to the stack now. Moving buttons to the left side to 
make it easier to add more.

Thanks,
Brian
On Nov 4, 2018, 10:02 PM -0600, Mark Wieder via use-livecode 
, wrote:
> On 11/4/18 4:45 PM, Brian Milby via use-livecode wrote:
> > My updated solution always looks for overlap but if none are found it uses 
> > optimized versions of the search (private functions instead of inside the 
> > main function). I special case for no overlap and a single overlap in the 
> > delimiter. It is about the same speed as Geoff’s.
>
> Nice. I tried to get tricky and replace that 'replace with' loop with a
> 'repeat for each' loop, but ended up about 20% slower. Not at all what I
> expected.
>
> --
> Mark Wieder
> ahsoftw...@gmail.com
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-04 Thread Mark Wieder via use-livecode

On 11/4/18 4:45 PM, Brian Milby via use-livecode wrote:

My updated solution always looks for overlap but if none are found it uses 
optimized versions of the search (private functions instead of inside the main 
function). I special case for no overlap and a single overlap in the delimiter. 
It is about the same speed as Geoff’s.


Nice. I tried to get tricky and replace that 'replace with' loop with a 
'repeat for each' loop, but ended up about 20% slower. Not at all what I 
expected.


--
 Mark Wieder
 ahsoftw...@gmail.com

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-04 Thread Bob Sneidar via use-livecode
Simply add 1 to the last offset pointer. If after the first iteration you 
return 1, then set the charsToSkip to 2 instead of offset + len(searchString) 
if you take my meaning. 

Bob S


> On Nov 2, 2018, at 17:43 , Geoff Canyon via use-livecode 
>  wrote:
> 
> I like that, changing it. Now available at
> https://github.com/gcanyon/alloffsets
> 
> One thing I don't see how to do without significantly impacting performance
> is to return all offsets if there are overlapping strings. For example:
> 
> allOffsets("aba","abababa")
> 
> would return 1,5, when it might be reasonable to expect it to return 1,3,5.
> Using the offset function with numToSkip would make that easy; adapting
> allOffsets to do so would be harder to do cleanly I think.
> 
> gc
> 
> On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode <
> use-livecode@lists.runrev.com> wrote:
> 
>> how about allOffsets?
>> 
>> Bob S
>> 
>> 
>>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <
>> use-livecode@lists.runrev.com> wrote:
>>> 
>>> All of those return a single value; I wanted to convey the concept of
>>> returning multiple values. To me listOffset implies it does the same
>> thing
>>> as itemOffset, since items come in a list. How about:
>>> 
>>> offsets -- not my favorite because it's almost indistinguishable from
>> offset
>>> offsetsOf -- seems a tad clumsy
>>> 
>>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
>>> use-livecode@lists.runrev.com> wrote:
>>> 
 It probably should be named listOffset, like itemOffset or lineOffset.
 
 Bob S
 
 
> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
 use-livecode@lists.runrev.com> wrote:
> 
> Nice! I *just* finished creating a github repository for it, and adding
> support for multi-char search strings, much as you did. I was coming to
 the
> list to post the update when I saw your post.
> 
> Here's the GitHub link: https://github.com/gcanyon/offsetlist
> 
> Here's my updated version:
> 
> function offsetList D,S,pCase
> -- returns a comma-delimited list of the offsets of D in S
> set the caseSensitive to pCase is true
> set the itemDel to D
> put length(D) into dLength
> put 1 - dLength into C
> repeat for each item i in S
>add length(i) + dLength to C
>put C,"" after R
> end repeat
> set the itemDel to comma
> if char -dLength to -1 of S is D then return char 1 to -2 of R
> put length(C) + 1 into lenC
> put length(R) into lenR
> if lenC = lenR then return 0
> return char 1 to lenR - lenC - 1 of R
> end offsetList
> 
> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
> use-livecode@lists.runrev.com> wrote:
> 
>> Hi Geoff,
>> 
>> thank you for this beautiful script.
>> 
>> I modified it a bit to accept multi-character search string and also
>> for
>> case sensitivity.
>> 
>> It definitely is a lot faster for unicode text than anything I have
 seen.
>> 
>> -
>> function offsetList D,S, pCase
>> -- returns a comma-delimited list of the offsets of D in S
>> -- pCase is a boolean for caseSensitive
>> set the caseSensitive to pCase
>> set the itemDel to D
>> put the length of D into tDelimLength
>> repeat for each item i in S
>>add length(i) + tDelimLength to C
>>put C - (tDelimLength - 1),"" after R
>> end repeat
>> set the itemDel to comma
>> if char -1 of S is D then return char 1 to -2 of R
>> put length(C) + 1 into lenC
>> put length(R) into lenR
>> if lenC = lenR then return 0
>> return char 1 to lenR - lenC - 1 of R
>> end offsetList
>> --
>> 
>> Kind regards
>> Bernd
>> 
>> 
>> 
>> 
>> 
>>> 
>>> Date: Thu, 1 Nov 2018 00:15:37 -0700
>>> From: Geoff Canyon
>>> To: How to use LiveCode 
>>> Subject: Re: How to find the offset of the last instance of a
>>>repeating   character in a string?
>>> 
>>> I was curious if using the itemDelimiter might work for this, so I
 wrote
>>> the below code out of curiosity; but in my quick testing with
 single-byte
>>> characters it was only about 30% faster than the above methods, so I
>> didn't
>>> bother to post it.
>>> 
>>> But Ben Rubinstein just posted about a terrible slow-down doing
>> pretty
>> much
>>> this same thing for text with unicode characters. So I ran a simple
 test
>>> with 8000 character long strings that start with a single unicode
>>> character, this is about 15x faster than offset() with skip. For
>>> 100,000-character lines it's about 300x faster, so it seems to be
 immune
>> to
>>> the line-painter issues skip is subject to. So for what it's worth:
>>> 
>>> function offsetList D,S
>>> -- returns a c

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-04 Thread Mark Wieder via use-livecode

On 11/4/18 6:49 PM, Geoff Canyon via use-livecode wrote:


I'm not sure I agree that it would be so unlikely to know that overlaps
won't occur (or that it's unreasonable to not want them). If I'm looking
for every instance of "romeo" in romeo and juliet, then obviously I'm not
expecting, nor do I want, overlaps.
Sure, but in that case you'd be better off using the faster 'offset' 
function. Or do you mean every instance of 'romeo' in the play itself? 
There I can see why you'd want to set it to false for speed.


My point isn't really whether pOverlaps should default to true or false, 
but that you need detailed knowledge of the corpus of data before 
calling the function.


If you're looking for 'romeo' in pText, would you set pOverlaps to true 
or to false?


--
 Mark Wieder
 ahsoftw...@gmail.com

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-04 Thread Geoff Canyon via use-livecode
On Sun, Nov 4, 2018 at 4:34 PM Mark Wieder via use-livecode <
use-livecode@lists.runrev.com> wrote:

> On 11/4/18 10:40 AM, Geoff Canyon via use-livecode wrote:
> > I also added a "with overlaps" option.
>
> My problem with the pWithOverlaps parameter is that is requires a priori
> knowledge of the data being consumed. If you already know there are
> overlaps then you'd set the parameter to true. If you don't know whether
> or not there are overlaps, then you'd need to set it to true so you
> don't miss anything (aside, of course, for the trivial case where you
> don't care whether or not there are overlaps - is there a use case for
> this?).
>
> The only time you would set it to false is after you've already
> determined that there are no overlaps, and the time spent on that would
> probably more than offset the extra processing in the function.


I'm not sure I agree that it would be so unlikely to know that overlaps
won't occur (or that it's unreasonable to not want them). If I'm looking
for every instance of "romeo" in romeo and juliet, then obviously I'm not
expecting, nor do I want, overlaps. Likewise, overlaps can only occur if
the search string allows for them, so "romeo" makes it impossible from the
get go

That said, it seems reasonable to default overlaps to true rather than
false. I'll set it up that way when I add the modification below.

On Sun, Nov 4, 2018 at 4:02 PM Brian Milby via use-livecode <
use-livecode@lists.runrev.com> wrote:

>
> put kList is not empty into pWithOverlaps
>

Good point -- I suppose it also makes sense (albeit that the speed
improvement would be trivial) to not bother even building kList if the term
to be found is a single character.

gc
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-04 Thread Brian Milby via use-livecode
My updated solution always looks for overlap but if none are found it uses 
optimized versions of the search (private functions instead of inside the main 
function). I special case for no overlap and a single overlap in the delimiter. 
It is about the same speed as Geoff’s.

Thanks,
Brian
On Nov 4, 2018, 6:34 PM -0600, Mark Wieder via use-livecode 
, wrote:
> On 11/4/18 10:40 AM, Geoff Canyon via use-livecode wrote:
> > I also added a "with overlaps" option.
>
> My problem with the pWithOverlaps parameter is that is requires a priori
> knowledge of the data being consumed. If you already know there are
> overlaps then you'd set the parameter to true. If you don't know whether
> or not there are overlaps, then you'd need to set it to true so you
> don't miss anything (aside, of course, for the trivial case where you
> don't care whether or not there are overlaps - is there a use case for
> this?).
>
> The only time you would set it to false is after you've already
> determined that there are no overlaps, and the time spent on that would
> probably more than offset the extra processing in the function.
>
> --
> Mark Wieder
> ahsoftw...@gmail.com
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-04 Thread Mark Wieder via use-livecode

On 11/4/18 10:40 AM, Geoff Canyon via use-livecode wrote:

I also added a "with overlaps" option.


My problem with the pWithOverlaps parameter is that is requires a priori 
knowledge of the data being consumed. If you already know there are 
overlaps then you'd set the parameter to true. If you don't know whether 
or not there are overlaps, then you'd need to set it to true so you 
don't miss anything (aside, of course, for the trivial case where you 
don't care whether or not there are overlaps - is there a use case for 
this?).


The only time you would set it to false is after you've already 
determined that there are no overlaps, and the time spent on that would 
probably more than offset the extra processing in the function.


--
 Mark Wieder
 ahsoftw...@gmail.com

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-04 Thread Brian Milby via use-livecode
Logic matches my solution.  I also validated my solution using just the
offset function.  Speed hit for with overlap is similar.  One possible
optimization:

put kList is not empty into pWithOverlaps

If with overlaps was requested but the source delimiter did not contain any
overlaps, then the extra loops are skipped.

Adding a character to the end is clever.  I'll need to incorporate that and
see what it does to my method.

My take on the code updates is here:
https://github.com/bwmilby/alloffsets/blob/bwm/bwm/allOffsets_Scripts/stack_allOffsets_button_id_1026.livecodescript

Stack and index of scripts here:
https://github.com/bwmilby/alloffsets/tree/bwm/bwm

On Sun, Nov 4, 2018 at 12:42 PM Geoff Canyon via use-livecode <
use-livecode@lists.runrev.com> wrote:

> Alex, good catch! The code below and at
> https://github.com/gcanyon/alloffsets now puts a stop character after the
> string to prevent the error you found. I also added a "with overlaps"
> option. I think this is correct, and about as efficient as possible, but
> thanks to anyone who finds a bug or a faster way.
>
> gc
>
>
> function allOffsets D,S,pCase,pWithOverlaps
>-- returns a comma-delimited list of the offsets of D in S
>set the caseSensitive to pCase is true
>put length(D) into dLength
>put numtochar(chartonum(char -1 of D) mod 2 + 1) after S
>if pWithOverlaps then
>   repeat with i = 1 to dLength - 1
>  if not (char i + 1 to -1 of D is char 1 to dLength - i of D) then
> next repeat
>  put char -i to -1 of D into OV[i]
>  put i & cr after kList
>   end repeat
>end if
>set the itemDel to D
>put 1 - dLength into C
>if pWithOverlaps then
>   repeat for each item i in S
>  repeat for each line K in kList
> if char 1 to K of (i & D) is OV[K] then put (C + K),"" after R
>  end repeat
>  add length(i) + dLength to C
>  put C,"" after R
>   end repeat
>else
>   repeat for each item i in S
>  add length(i) + dLength to C
>  put C,"" after R
>   end repeat
>end if
>set the itemDel to comma
>repeat until item 1 of R > 0
>   delete item 1 of R
>end repeat
>delete item -1 of R
>if R is empty then return 0 else return char 1 to -2 of R
> end allOffsets
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-04 Thread Geoff Canyon via use-livecode
Alex, good catch! The code below and at
https://github.com/gcanyon/alloffsets now puts a stop character after the
string to prevent the error you found. I also added a "with overlaps"
option. I think this is correct, and about as efficient as possible, but
thanks to anyone who finds a bug or a faster way.

gc


function allOffsets D,S,pCase,pWithOverlaps
   -- returns a comma-delimited list of the offsets of D in S
   set the caseSensitive to pCase is true
   put length(D) into dLength
   put numtochar(chartonum(char -1 of D) mod 2 + 1) after S
   if pWithOverlaps then
  repeat with i = 1 to dLength - 1
 if not (char i + 1 to -1 of D is char 1 to dLength - i of D) then
next repeat
 put char -i to -1 of D into OV[i]
 put i & cr after kList
  end repeat
   end if
   set the itemDel to D
   put 1 - dLength into C
   if pWithOverlaps then
  repeat for each item i in S
 repeat for each line K in kList
if char 1 to K of (i & D) is OV[K] then put (C + K),"" after R
 end repeat
 add length(i) + dLength to C
 put C,"" after R
  end repeat
   else
  repeat for each item i in S
 add length(i) + dLength to C
 put C,"" after R
  end repeat
   end if
   set the itemDel to comma
   repeat until item 1 of R > 0
  delete item 1 of R
   end repeat
   delete item -1 of R
   if R is empty then return 0 else return char 1 to -2 of R
end allOffsets
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-03 Thread Brian Milby via use-livecode
I've posted a binary stack version that includes my version.  I cloned and
made a "bwm" branch in my clone.  Here's the direct link to the script with
the posted code (updated to use private functions):

https://github.com/bwmilby/alloffsets/blob/bwm/bwm/allOffsets_Scripts/stack_allOffsets_button_id_1009.livecodescript

The binary stack can be found here:

https://github.com/bwmilby/alloffsets/tree/bwm/bwm

There are 3 button across the top.  The first is Geoff's version.  The
second is my combined version.  The third is the one with private functions
added.  The first button replaces the results field.  The second and third
add their results to the results field.

The top field is the string to find (needle), the second is the string to
search (haystack), the third is for the results.
Everything is in a background group so you can add cards for unique
searches.

On Sat, Nov 3, 2018 at 9:17 AM Brian Milby  wrote:

> Good catch Alex.  My code was closer, but didn't handle repeating
> characters correctly.  Here is an updated version.
>
> function allOffsets2 D,S,pCase
>local dLength, C, R
>-- returns a comma-delimited list of the offsets of D in S
>set the caseSensitive to pCase is true
>set the itemDel to D
>put length(D) into dLength
>put 1 - dLength into C
>
>if dLength > 1 then
>   local n, i, j, D2, L2
>   put 0 into n
>   repeat with i = 2 to dLength
>  if char i to -1 of D is char 1 to -i of D then
> add 1 to n
> put char (1-i) to -1 of D into D2[n]
> put i-1 into L2[n]
>  end if
>   end repeat
>end if
>
>repeat for each item i in S
>   if C > 0 and n > 0 then
>  repeat with j = 1 to n
> if i&D begins with D2[j] then
>put C+L2[j],"" after R
> end if
>  end repeat
>   end if
>   add length(i) + dLength to C
>   put C,"" after R
>end repeat
>set the itemDel to comma
>delete char -1 of R
>
>if item -1 of R > len(S) then
>   if the number of items of R is 1 then
>  return 0
>   else
>  delete item -1 of R
>   end if
>end if
>
>if len(i) > 0 then
>   repeat with j = n down to len(i)+1
>  if char -len(D2[j]) to -1 of S is D2[j] then
> delete item -1 of R
>  end if
>   end repeat
>end if
>return R
> end allOffsets2
>
>
> On Sat, Nov 3, 2018 at 8:33 AM Alex Tweedly via use-livecode <
> use-livecode@lists.runrev.com> wrote:
>
>> Hi Geoff,
>>
>> unfortunately the impact of overlapping delimiter strings is more severe
>> than simply not finding them. The code on github gets the wrong answer
>> if there is an overlapping string at the very end of the search string,
>> e.g.
>>
>> alloffsets("", "a")wrongly gives  1,5,10
>>
>> I suspect the test for
>>
>>   if char -dLength to -1 of S is D then return char 1 to -2 of R
>> should be (something like)
>>if item -1 of S is empty then return char 1 to -2 of R
>> but to be honest, I'm not 10% certain of that.
>>
>> Alex.
>>
>>
>>
>> On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:
>> > I like that, changing it. Now available at
>> > https://github.com/gcanyon/alloffsets
>> >
>> > One thing I don't see how to do without significantly impacting
>> performance
>> > is to return all offsets if there are overlapping strings. For example:
>> >
>> > allOffsets("aba","abababa")
>> >
>> > would return 1,5, when it might be reasonable to expect it to return
>> 1,3,5.
>> > Using the offset function with numToSkip would make that easy; adapting
>> > allOffsets to do so would be harder to do cleanly I think.
>> >
>> > gc
>> >
>> > On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode <
>> > use-livecode@lists.runrev.com> wrote:
>> >
>> >> how about allOffsets?
>> >>
>> >> Bob S
>> >>
>> >>
>> >>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <
>> >> use-livecode@lists.runrev.com> wrote:
>> >>> All of those return a single value; I wanted to convey the concept of
>> >>> returning multiple values. To me listOffset implies it does the same
>> >> thing
>> >>> as itemOffset, since items come in a list. How about:
>> >>>
>> >>> offsets -- not my favorite because it's almost indistinguishable from
>> >> offset
>> >>> offsetsOf -- seems a tad clumsy
>> >>>
>> >>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
>> >>> use-livecode@lists.runrev.com> wrote:
>> >>>
>>  It probably should be named listOffset, like itemOffset or
>> lineOffset.
>> 
>>  Bob S
>> 
>> 
>> > On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
>>  use-livecode@lists.runrev.com> wrote:
>> > Nice! I *just* finished creating a github repository for it, and
>> adding
>> > support for multi-char search strings, much as you did. I was
>> coming to
>>  the
>> > list to post the update when I saw your post.
>> >
>> > Here's the GitHub link: h

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-03 Thread Brian Milby via use-livecode
Good catch Alex.  My code was closer, but didn't handle repeating
characters correctly.  Here is an updated version.

function allOffsets2 D,S,pCase
   local dLength, C, R
   -- returns a comma-delimited list of the offsets of D in S
   set the caseSensitive to pCase is true
   set the itemDel to D
   put length(D) into dLength
   put 1 - dLength into C

   if dLength > 1 then
  local n, i, j, D2, L2
  put 0 into n
  repeat with i = 2 to dLength
 if char i to -1 of D is char 1 to -i of D then
add 1 to n
put char (1-i) to -1 of D into D2[n]
put i-1 into L2[n]
 end if
  end repeat
   end if

   repeat for each item i in S
  if C > 0 and n > 0 then
 repeat with j = 1 to n
if i&D begins with D2[j] then
   put C+L2[j],"" after R
end if
 end repeat
  end if
  add length(i) + dLength to C
  put C,"" after R
   end repeat
   set the itemDel to comma
   delete char -1 of R

   if item -1 of R > len(S) then
  if the number of items of R is 1 then
 return 0
  else
 delete item -1 of R
  end if
   end if

   if len(i) > 0 then
  repeat with j = n down to len(i)+1
 if char -len(D2[j]) to -1 of S is D2[j] then
delete item -1 of R
 end if
  end repeat
   end if
   return R
end allOffsets2


On Sat, Nov 3, 2018 at 8:33 AM Alex Tweedly via use-livecode <
use-livecode@lists.runrev.com> wrote:

> Hi Geoff,
>
> unfortunately the impact of overlapping delimiter strings is more severe
> than simply not finding them. The code on github gets the wrong answer
> if there is an overlapping string at the very end of the search string,
> e.g.
>
> alloffsets("", "a")wrongly gives  1,5,10
>
> I suspect the test for
>
>   if char -dLength to -1 of S is D then return char 1 to -2 of R
> should be (something like)
>if item -1 of S is empty then return char 1 to -2 of R
> but to be honest, I'm not 10% certain of that.
>
> Alex.
>
>
>
> On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:
> > I like that, changing it. Now available at
> > https://github.com/gcanyon/alloffsets
> >
> > One thing I don't see how to do without significantly impacting
> performance
> > is to return all offsets if there are overlapping strings. For example:
> >
> > allOffsets("aba","abababa")
> >
> > would return 1,5, when it might be reasonable to expect it to return
> 1,3,5.
> > Using the offset function with numToSkip would make that easy; adapting
> > allOffsets to do so would be harder to do cleanly I think.
> >
> > gc
> >
> > On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode <
> > use-livecode@lists.runrev.com> wrote:
> >
> >> how about allOffsets?
> >>
> >> Bob S
> >>
> >>
> >>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <
> >> use-livecode@lists.runrev.com> wrote:
> >>> All of those return a single value; I wanted to convey the concept of
> >>> returning multiple values. To me listOffset implies it does the same
> >> thing
> >>> as itemOffset, since items come in a list. How about:
> >>>
> >>> offsets -- not my favorite because it's almost indistinguishable from
> >> offset
> >>> offsetsOf -- seems a tad clumsy
> >>>
> >>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
> >>> use-livecode@lists.runrev.com> wrote:
> >>>
>  It probably should be named listOffset, like itemOffset or lineOffset.
> 
>  Bob S
> 
> 
> > On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
>  use-livecode@lists.runrev.com> wrote:
> > Nice! I *just* finished creating a github repository for it, and
> adding
> > support for multi-char search strings, much as you did. I was coming
> to
>  the
> > list to post the update when I saw your post.
> >
> > Here's the GitHub link: https://github.com/gcanyon/offsetlist
> >
> > Here's my updated version:
> >
> > function offsetList D,S,pCase
> >   -- returns a comma-delimited list of the offsets of D in S
> >   set the caseSensitive to pCase is true
> >   set the itemDel to D
> >   put length(D) into dLength
> >   put 1 - dLength into C
> >   repeat for each item i in S
> >  add length(i) + dLength to C
> >  put C,"" after R
> >   end repeat
> >   set the itemDel to comma
> >   if char -dLength to -1 of S is D then return char 1 to -2 of R
> >   put length(C) + 1 into lenC
> >   put length(R) into lenR
> >   if lenC = lenR then return 0
> >   return char 1 to lenR - lenC - 1 of R
> > end offsetList
> >
> > On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
> > use-livecode@lists.runrev.com> wrote:
> >
> >> Hi Geoff,
> >>
> >> thank you for this beautiful script.
> >>
> >> I modified it a bit to accept multi-character search string and also
> >> for
> >> case sensitivity.
> >>
> 

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-03 Thread Alex Tweedly via use-livecode

Hi Geoff,

unfortunately the impact of overlapping delimiter strings is more severe 
than simply not finding them. The code on github gets the wrong answer 
if there is an overlapping string at the very end of the search string, e.g.


alloffsets("", "a")    wrongly gives  1,5,10

I suspect the test for

 if char -dLength to -1 of S is D then return char 1 to -2 of R
should be (something like)
  if item -1 of S is empty then return char 1 to -2 of R
but to be honest, I'm not 10% certain of that.

Alex.



On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:

I like that, changing it. Now available at
https://github.com/gcanyon/alloffsets

One thing I don't see how to do without significantly impacting performance
is to return all offsets if there are overlapping strings. For example:

allOffsets("aba","abababa")

would return 1,5, when it might be reasonable to expect it to return 1,3,5.
Using the offset function with numToSkip would make that easy; adapting
allOffsets to do so would be harder to do cleanly I think.

gc

On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode <
use-livecode@lists.runrev.com> wrote:


how about allOffsets?

Bob S



On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <

use-livecode@lists.runrev.com> wrote:

All of those return a single value; I wanted to convey the concept of
returning multiple values. To me listOffset implies it does the same

thing

as itemOffset, since items come in a list. How about:

offsets -- not my favorite because it's almost indistinguishable from

offset

offsetsOf -- seems a tad clumsy

On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
use-livecode@lists.runrev.com> wrote:


It probably should be named listOffset, like itemOffset or lineOffset.

Bob S



On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <

use-livecode@lists.runrev.com> wrote:

Nice! I *just* finished creating a github repository for it, and adding
support for multi-char search strings, much as you did. I was coming to

the

list to post the update when I saw your post.

Here's the GitHub link: https://github.com/gcanyon/offsetlist

Here's my updated version:

function offsetList D,S,pCase
  -- returns a comma-delimited list of the offsets of D in S
  set the caseSensitive to pCase is true
  set the itemDel to D
  put length(D) into dLength
  put 1 - dLength into C
  repeat for each item i in S
 add length(i) + dLength to C
 put C,"" after R
  end repeat
  set the itemDel to comma
  if char -dLength to -1 of S is D then return char 1 to -2 of R
  put length(C) + 1 into lenC
  put length(R) into lenR
  if lenC = lenR then return 0
  return char 1 to lenR - lenC - 1 of R
end offsetList

On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
use-livecode@lists.runrev.com> wrote:


Hi Geoff,

thank you for this beautiful script.

I modified it a bit to accept multi-character search string and also

for

case sensitivity.

It definitely is a lot faster for unicode text than anything I have

seen.

-
function offsetList D,S, pCase
  -- returns a comma-delimited list of the offsets of D in S
  -- pCase is a boolean for caseSensitive
  set the caseSensitive to pCase
  set the itemDel to D
  put the length of D into tDelimLength
  repeat for each item i in S
 add length(i) + tDelimLength to C
 put C - (tDelimLength - 1),"" after R
  end repeat
  set the itemDel to comma
  if char -1 of S is D then return char 1 to -2 of R
  put length(C) + 1 into lenC
  put length(R) into lenR
  if lenC = lenR then return 0
  return char 1 to lenR - lenC - 1 of R
end offsetList
--

Kind regards
Bernd






Date: Thu, 1 Nov 2018 00:15:37 -0700
From: Geoff Canyon
To: How to use LiveCode 
Subject: Re: How to find the offset of the last instance of a
 repeating   character in a string?

I was curious if using the itemDelimiter might work for this, so I

wrote

the below code out of curiosity; but in my quick testing with

single-byte

characters it was only about 30% faster than the above methods, so I

didn't

bother to post it.

But Ben Rubinstein just posted about a terrible slow-down doing

pretty

much

this same thing for text with unicode characters. So I ran a simple

test

with 8000 character long strings that start with a single unicode
character, this is about 15x faster than offset() with skip. For
100,000-character lines it's about 300x faster, so it seems to be

immune

to

the line-painter issues skip is subject to. So for what it's worth:

function offsetList D,S
-- returns a comma-delimited list of the offsets of D in S
set the itemDel to D
repeat for each item i in S
add length(i) + 1 to C
put C,"" after R
end repeat
set the itemDel to comma
if char -1 of S is D then return char 1 to -2 of R
put length(C) + 1 into lenC
put length(R) into lenR
if lenC = lenR then return 0
return char 1 to lenR - lenC - 1 of R
end offsetList



___

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-02 Thread Brian Milby via use-livecode
Here is something... probably needs some optimization

function allOffsets2 D,S,pCase
   local dLength, C, R
   -- returns a comma-delimited list of the offsets of D in S
   set the caseSensitive to pCase is true
   set the itemDel to D
   put length(D) into dLength
   put 1 - dLength into C

   if dLength > 1 then
  local n, i, j, D2, L2
  put 0 into n
  repeat with i = 2 to dLength
 if char i to -1 of D is char 1 to -i of D then
add 1 to n
put char (1-i) to -1 of D into D2[n]
put i-1 into L2[n]
 end if
  end repeat
   end if

   repeat for each item i in S
  if C > 0 and n > 0 then
 repeat with j = 1 to n
if i&D begins with D2[j] then
   put C+L2[j],"" after R
end if
 end repeat
  end if
  add length(i) + dLength to C
  put C,"" after R
   end repeat
   set the itemDel to comma
   delete char -1 of R

   if item -1 of R > len(S) then
  if the number of items of R is 1 then
 return 0
  else
 delete item -1 of R
  end if
   end if

   if char -dLength to -1 of S is D then
  return R
   end if

   repeat with j = n down to 1
  if char -len(D2[j]) to -1 of S is D2[j] then
 delete item -1 of R
  end if
   end repeat
   return R
end allOffsets2


I think a couple of private functions would be good.  One for 0 overlap,
one for a single overlap, then a final general one for any number of
overlaps (the core of the above).  After the loop that generates D2/L2 I
would branch based on n to avoid the additional comparisons inside the loop.

On Fri, Nov 2, 2018 at 9:45 PM Alex Tweedly via use-livecode <
use-livecode@lists.runrev.com> wrote:

> Oh dear - answering my own posts  rarely a good sign :-)
>
>
> On 03/11/2018 02:10, Alex Tweedly via use-livecode wrote:
> >
> > On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:
> >> One thing I don't see how to do without significantly impacting
> >> performance
> >> is to return all offsets if there are overlapping strings. For example:
> >>
> >> allOffsets("aba","abababa")
> >>
> >> would return 1,5, when it might be reasonable to expect it to return
> >> 1,3,5.
> >> Using the offset function with numToSkip would make that easy; adapting
> >> allOffsets to do so would be harder to do cleanly I think.
> >>
> > Can I suggest changing it to "someOffsets()" :-) :-)
> >
> > But seriously, can you not iteratively run "allofsets" ?
> >
> Answer : NO. That doesn't work.
> However, there is a more efficient way that does work - but it needs to
> be tested before I post it.
>
> -- Alex.
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-02 Thread Alex Tweedly via use-livecode

Oh dear - answering my own posts  rarely a good sign :-)


On 03/11/2018 02:10, Alex Tweedly via use-livecode wrote:


On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:
One thing I don't see how to do without significantly impacting 
performance

is to return all offsets if there are overlapping strings. For example:

allOffsets("aba","abababa")

would return 1,5, when it might be reasonable to expect it to return 
1,3,5.

Using the offset function with numToSkip would make that easy; adapting
allOffsets to do so would be harder to do cleanly I think.


Can I suggest changing it to "someOffsets()" :-) :-)

But seriously, can you not iteratively run "allofsets" ?


Answer : NO. That doesn't work.
However, there is a more efficient way that does work - but it needs to 
be tested before I post it.


-- Alex.

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-02 Thread Alex Tweedly via use-livecode


On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:

I like that, changing it. Now available at
https://github.com/gcanyon/alloffsets

One thing I don't see how to do without significantly impacting performance
is to return all offsets if there are overlapping strings. For example:

allOffsets("aba","abababa")

would return 1,5, when it might be reasonable to expect it to return 1,3,5.
Using the offset function with numToSkip would make that easy; adapting
allOffsets to do so would be harder to do cleanly I think.


Can I suggest changing it to "someOffsets()" :-) :-)

But seriously, can you not iteratively run "allofsets" ?
something like  (typed straight into email - totally untested)

function allOffsets pDel, pStr
 repeat with c = 1 to 255  -- or some other upper limit ?
    if NOT pDel contains numtochar(c) then
       put numtochar(c) into c
       exit repeat
    end if
  end repeat
  repeat forever
    put someOffsets(pDel, pStr) into newR
    if the number of items in newR = 0 then exit repeat
    repeat for each item I in newR
       put c into char I of newR
    end repeat
    put newR after R
  end repeat
  sort items of R numeric
  return R
end alloffsets

-- Alex.

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-02 Thread Geoff Canyon via use-livecode
I like that, changing it. Now available at
https://github.com/gcanyon/alloffsets

One thing I don't see how to do without significantly impacting performance
is to return all offsets if there are overlapping strings. For example:

allOffsets("aba","abababa")

would return 1,5, when it might be reasonable to expect it to return 1,3,5.
Using the offset function with numToSkip would make that easy; adapting
allOffsets to do so would be harder to do cleanly I think.

gc

On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode <
use-livecode@lists.runrev.com> wrote:

> how about allOffsets?
>
> Bob S
>
>
> > On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <
> use-livecode@lists.runrev.com> wrote:
> >
> > All of those return a single value; I wanted to convey the concept of
> > returning multiple values. To me listOffset implies it does the same
> thing
> > as itemOffset, since items come in a list. How about:
> >
> > offsets -- not my favorite because it's almost indistinguishable from
> offset
> > offsetsOf -- seems a tad clumsy
> >
> > On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
> > use-livecode@lists.runrev.com> wrote:
> >
> >> It probably should be named listOffset, like itemOffset or lineOffset.
> >>
> >> Bob S
> >>
> >>
> >>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
> >> use-livecode@lists.runrev.com> wrote:
> >>>
> >>> Nice! I *just* finished creating a github repository for it, and adding
> >>> support for multi-char search strings, much as you did. I was coming to
> >> the
> >>> list to post the update when I saw your post.
> >>>
> >>> Here's the GitHub link: https://github.com/gcanyon/offsetlist
> >>>
> >>> Here's my updated version:
> >>>
> >>> function offsetList D,S,pCase
> >>>  -- returns a comma-delimited list of the offsets of D in S
> >>>  set the caseSensitive to pCase is true
> >>>  set the itemDel to D
> >>>  put length(D) into dLength
> >>>  put 1 - dLength into C
> >>>  repeat for each item i in S
> >>> add length(i) + dLength to C
> >>> put C,"" after R
> >>>  end repeat
> >>>  set the itemDel to comma
> >>>  if char -dLength to -1 of S is D then return char 1 to -2 of R
> >>>  put length(C) + 1 into lenC
> >>>  put length(R) into lenR
> >>>  if lenC = lenR then return 0
> >>>  return char 1 to lenR - lenC - 1 of R
> >>> end offsetList
> >>>
> >>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
> >>> use-livecode@lists.runrev.com> wrote:
> >>>
>  Hi Geoff,
> 
>  thank you for this beautiful script.
> 
>  I modified it a bit to accept multi-character search string and also
> for
>  case sensitivity.
> 
>  It definitely is a lot faster for unicode text than anything I have
> >> seen.
> 
>  -
>  function offsetList D,S, pCase
>   -- returns a comma-delimited list of the offsets of D in S
>   -- pCase is a boolean for caseSensitive
>   set the caseSensitive to pCase
>   set the itemDel to D
>   put the length of D into tDelimLength
>   repeat for each item i in S
>  add length(i) + tDelimLength to C
>  put C - (tDelimLength - 1),"" after R
>   end repeat
>   set the itemDel to comma
>   if char -1 of S is D then return char 1 to -2 of R
>   put length(C) + 1 into lenC
>   put length(R) into lenR
>   if lenC = lenR then return 0
>   return char 1 to lenR - lenC - 1 of R
>  end offsetList
>  --
> 
>  Kind regards
>  Bernd
> 
> 
> 
> 
> 
> >
> > Date: Thu, 1 Nov 2018 00:15:37 -0700
> > From: Geoff Canyon
> > To: How to use LiveCode 
> > Subject: Re: How to find the offset of the last instance of a
> > repeating   character in a string?
> >
> > I was curious if using the itemDelimiter might work for this, so I
> >> wrote
> > the below code out of curiosity; but in my quick testing with
> >> single-byte
> > characters it was only about 30% faster than the above methods, so I
>  didn't
> > bother to post it.
> >
> > But Ben Rubinstein just posted about a terrible slow-down doing
> pretty
>  much
> > this same thing for text with unicode characters. So I ran a simple
> >> test
> > with 8000 character long strings that start with a single unicode
> > character, this is about 15x faster than offset() with skip. For
> > 100,000-character lines it's about 300x faster, so it seems to be
> >> immune
>  to
> > the line-painter issues skip is subject to. So for what it's worth:
> >
> > function offsetList D,S
> > -- returns a comma-delimited list of the offsets of D in S
> > set the itemDel to D
> > repeat for each item i in S
> >add length(i) + 1 to C
> >put C,"" after R
> > end repeat
> > set the itemDel to comma
> > if char -1 of S is D then return char 1 to -2 of R
> > put length(C)

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-02 Thread Bob Sneidar via use-livecode
how about allOffsets?

Bob S


> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode 
>  wrote:
> 
> All of those return a single value; I wanted to convey the concept of
> returning multiple values. To me listOffset implies it does the same thing
> as itemOffset, since items come in a list. How about:
> 
> offsets -- not my favorite because it's almost indistinguishable from offset
> offsetsOf -- seems a tad clumsy
> 
> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
> use-livecode@lists.runrev.com> wrote:
> 
>> It probably should be named listOffset, like itemOffset or lineOffset.
>> 
>> Bob S
>> 
>> 
>>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
>> use-livecode@lists.runrev.com> wrote:
>>> 
>>> Nice! I *just* finished creating a github repository for it, and adding
>>> support for multi-char search strings, much as you did. I was coming to
>> the
>>> list to post the update when I saw your post.
>>> 
>>> Here's the GitHub link: https://github.com/gcanyon/offsetlist
>>> 
>>> Here's my updated version:
>>> 
>>> function offsetList D,S,pCase
>>>  -- returns a comma-delimited list of the offsets of D in S
>>>  set the caseSensitive to pCase is true
>>>  set the itemDel to D
>>>  put length(D) into dLength
>>>  put 1 - dLength into C
>>>  repeat for each item i in S
>>> add length(i) + dLength to C
>>> put C,"" after R
>>>  end repeat
>>>  set the itemDel to comma
>>>  if char -dLength to -1 of S is D then return char 1 to -2 of R
>>>  put length(C) + 1 into lenC
>>>  put length(R) into lenR
>>>  if lenC = lenR then return 0
>>>  return char 1 to lenR - lenC - 1 of R
>>> end offsetList
>>> 
>>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
>>> use-livecode@lists.runrev.com> wrote:
>>> 
 Hi Geoff,
 
 thank you for this beautiful script.
 
 I modified it a bit to accept multi-character search string and also for
 case sensitivity.
 
 It definitely is a lot faster for unicode text than anything I have
>> seen.
 
 -
 function offsetList D,S, pCase
  -- returns a comma-delimited list of the offsets of D in S
  -- pCase is a boolean for caseSensitive
  set the caseSensitive to pCase
  set the itemDel to D
  put the length of D into tDelimLength
  repeat for each item i in S
 add length(i) + tDelimLength to C
 put C - (tDelimLength - 1),"" after R
  end repeat
  set the itemDel to comma
  if char -1 of S is D then return char 1 to -2 of R
  put length(C) + 1 into lenC
  put length(R) into lenR
  if lenC = lenR then return 0
  return char 1 to lenR - lenC - 1 of R
 end offsetList
 --
 
 Kind regards
 Bernd
 
 
 
 
 
> 
> Date: Thu, 1 Nov 2018 00:15:37 -0700
> From: Geoff Canyon
> To: How to use LiveCode 
> Subject: Re: How to find the offset of the last instance of a
> repeating   character in a string?
> 
> I was curious if using the itemDelimiter might work for this, so I
>> wrote
> the below code out of curiosity; but in my quick testing with
>> single-byte
> characters it was only about 30% faster than the above methods, so I
 didn't
> bother to post it.
> 
> But Ben Rubinstein just posted about a terrible slow-down doing pretty
 much
> this same thing for text with unicode characters. So I ran a simple
>> test
> with 8000 character long strings that start with a single unicode
> character, this is about 15x faster than offset() with skip. For
> 100,000-character lines it's about 300x faster, so it seems to be
>> immune
 to
> the line-painter issues skip is subject to. So for what it's worth:
> 
> function offsetList D,S
> -- returns a comma-delimited list of the offsets of D in S
> set the itemDel to D
> repeat for each item i in S
>add length(i) + 1 to C
>put C,"" after R
> end repeat
> set the itemDel to comma
> if char -1 of S is D then return char 1 to -2 of R
> put length(C) + 1 into lenC
> put length(R) into lenR
> if lenC = lenR then return 0
> return char 1 to lenR - lenC - 1 of R
> end offsetList
> 
 
 
 ___
 use-livecode mailing list
 use-livecode@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your
 subscription preferences:
 http://lists.runrev.com/mailman/listinfo/use-livecode
 
>>> ___
>>> use-livecode mailing list
>>> use-livecode@lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
>> 
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-02 Thread Geoff Canyon via use-livecode
All of those return a single value; I wanted to convey the concept of
returning multiple values. To me listOffset implies it does the same thing
as itemOffset, since items come in a list. How about:

offsets -- not my favorite because it's almost indistinguishable from offset
offsetsOf -- seems a tad clumsy

On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
use-livecode@lists.runrev.com> wrote:

> It probably should be named listOffset, like itemOffset or lineOffset.
>
> Bob S
>
>
> > On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
> use-livecode@lists.runrev.com> wrote:
> >
> > Nice! I *just* finished creating a github repository for it, and adding
> > support for multi-char search strings, much as you did. I was coming to
> the
> > list to post the update when I saw your post.
> >
> > Here's the GitHub link: https://github.com/gcanyon/offsetlist
> >
> > Here's my updated version:
> >
> > function offsetList D,S,pCase
> >   -- returns a comma-delimited list of the offsets of D in S
> >   set the caseSensitive to pCase is true
> >   set the itemDel to D
> >   put length(D) into dLength
> >   put 1 - dLength into C
> >   repeat for each item i in S
> >  add length(i) + dLength to C
> >  put C,"" after R
> >   end repeat
> >   set the itemDel to comma
> >   if char -dLength to -1 of S is D then return char 1 to -2 of R
> >   put length(C) + 1 into lenC
> >   put length(R) into lenR
> >   if lenC = lenR then return 0
> >   return char 1 to lenR - lenC - 1 of R
> > end offsetList
> >
> > On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
> > use-livecode@lists.runrev.com> wrote:
> >
> >> Hi Geoff,
> >>
> >> thank you for this beautiful script.
> >>
> >> I modified it a bit to accept multi-character search string and also for
> >> case sensitivity.
> >>
> >> It definitely is a lot faster for unicode text than anything I have
> seen.
> >>
> >> -
> >> function offsetList D,S, pCase
> >>   -- returns a comma-delimited list of the offsets of D in S
> >>   -- pCase is a boolean for caseSensitive
> >>   set the caseSensitive to pCase
> >>   set the itemDel to D
> >>   put the length of D into tDelimLength
> >>   repeat for each item i in S
> >>  add length(i) + tDelimLength to C
> >>  put C - (tDelimLength - 1),"" after R
> >>   end repeat
> >>   set the itemDel to comma
> >>   if char -1 of S is D then return char 1 to -2 of R
> >>   put length(C) + 1 into lenC
> >>   put length(R) into lenR
> >>   if lenC = lenR then return 0
> >>   return char 1 to lenR - lenC - 1 of R
> >> end offsetList
> >> --
> >>
> >> Kind regards
> >> Bernd
> >>
> >>
> >>
> >>
> >>
> >>>
> >>> Date: Thu, 1 Nov 2018 00:15:37 -0700
> >>> From: Geoff Canyon
> >>> To: How to use LiveCode 
> >>> Subject: Re: How to find the offset of the last instance of a
> >>>  repeating   character in a string?
> >>>
> >>> I was curious if using the itemDelimiter might work for this, so I
> wrote
> >>> the below code out of curiosity; but in my quick testing with
> single-byte
> >>> characters it was only about 30% faster than the above methods, so I
> >> didn't
> >>> bother to post it.
> >>>
> >>> But Ben Rubinstein just posted about a terrible slow-down doing pretty
> >> much
> >>> this same thing for text with unicode characters. So I ran a simple
> test
> >>> with 8000 character long strings that start with a single unicode
> >>> character, this is about 15x faster than offset() with skip. For
> >>> 100,000-character lines it's about 300x faster, so it seems to be
> immune
> >> to
> >>> the line-painter issues skip is subject to. So for what it's worth:
> >>>
> >>> function offsetList D,S
> >>>  -- returns a comma-delimited list of the offsets of D in S
> >>>  set the itemDel to D
> >>>  repeat for each item i in S
> >>> add length(i) + 1 to C
> >>> put C,"" after R
> >>>  end repeat
> >>>  set the itemDel to comma
> >>>  if char -1 of S is D then return char 1 to -2 of R
> >>>  put length(C) + 1 into lenC
> >>>  put length(R) into lenR
> >>>  if lenC = lenR then return 0
> >>>  return char 1 to lenR - lenC - 1 of R
> >>> end offsetList
> >>>
> >>
> >>
> >> ___
> >> use-livecode mailing list
> >> use-livecode@lists.runrev.com
> >> Please visit this url to subscribe, unsubscribe and manage your
> >> subscription preferences:
> >> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>
> > ___
> > use-livecode mailing list
> > use-livecode@lists.runrev.com
> > Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-02 Thread Bob Sneidar via use-livecode
It probably should be named listOffset, like itemOffset or lineOffset. 

Bob S


> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode 
>  wrote:
> 
> Nice! I *just* finished creating a github repository for it, and adding
> support for multi-char search strings, much as you did. I was coming to the
> list to post the update when I saw your post.
> 
> Here's the GitHub link: https://github.com/gcanyon/offsetlist
> 
> Here's my updated version:
> 
> function offsetList D,S,pCase
>   -- returns a comma-delimited list of the offsets of D in S
>   set the caseSensitive to pCase is true
>   set the itemDel to D
>   put length(D) into dLength
>   put 1 - dLength into C
>   repeat for each item i in S
>  add length(i) + dLength to C
>  put C,"" after R
>   end repeat
>   set the itemDel to comma
>   if char -dLength to -1 of S is D then return char 1 to -2 of R
>   put length(C) + 1 into lenC
>   put length(R) into lenR
>   if lenC = lenR then return 0
>   return char 1 to lenR - lenC - 1 of R
> end offsetList
> 
> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
> use-livecode@lists.runrev.com> wrote:
> 
>> Hi Geoff,
>> 
>> thank you for this beautiful script.
>> 
>> I modified it a bit to accept multi-character search string and also for
>> case sensitivity.
>> 
>> It definitely is a lot faster for unicode text than anything I have seen.
>> 
>> -
>> function offsetList D,S, pCase
>>   -- returns a comma-delimited list of the offsets of D in S
>>   -- pCase is a boolean for caseSensitive
>>   set the caseSensitive to pCase
>>   set the itemDel to D
>>   put the length of D into tDelimLength
>>   repeat for each item i in S
>>  add length(i) + tDelimLength to C
>>  put C - (tDelimLength - 1),"" after R
>>   end repeat
>>   set the itemDel to comma
>>   if char -1 of S is D then return char 1 to -2 of R
>>   put length(C) + 1 into lenC
>>   put length(R) into lenR
>>   if lenC = lenR then return 0
>>   return char 1 to lenR - lenC - 1 of R
>> end offsetList
>> --
>> 
>> Kind regards
>> Bernd
>> 
>> 
>> 
>> 
>> 
>>> 
>>> Date: Thu, 1 Nov 2018 00:15:37 -0700
>>> From: Geoff Canyon
>>> To: How to use LiveCode 
>>> Subject: Re: How to find the offset of the last instance of a
>>>  repeating   character in a string?
>>> 
>>> I was curious if using the itemDelimiter might work for this, so I wrote
>>> the below code out of curiosity; but in my quick testing with single-byte
>>> characters it was only about 30% faster than the above methods, so I
>> didn't
>>> bother to post it.
>>> 
>>> But Ben Rubinstein just posted about a terrible slow-down doing pretty
>> much
>>> this same thing for text with unicode characters. So I ran a simple test
>>> with 8000 character long strings that start with a single unicode
>>> character, this is about 15x faster than offset() with skip. For
>>> 100,000-character lines it's about 300x faster, so it seems to be immune
>> to
>>> the line-painter issues skip is subject to. So for what it's worth:
>>> 
>>> function offsetList D,S
>>>  -- returns a comma-delimited list of the offsets of D in S
>>>  set the itemDel to D
>>>  repeat for each item i in S
>>> add length(i) + 1 to C
>>> put C,"" after R
>>>  end repeat
>>>  set the itemDel to comma
>>>  if char -1 of S is D then return char 1 to -2 of R
>>>  put length(C) + 1 into lenC
>>>  put length(R) into lenR
>>>  if lenC = lenR then return 0
>>>  return char 1 to lenR - lenC - 1 of R
>>> end offsetList
>>> 
>> 
>> 
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-01 Thread Geoff Canyon via use-livecode
Nice! I *just* finished creating a github repository for it, and adding
support for multi-char search strings, much as you did. I was coming to the
list to post the update when I saw your post.

Here's the GitHub link: https://github.com/gcanyon/offsetlist

Here's my updated version:

function offsetList D,S,pCase
   -- returns a comma-delimited list of the offsets of D in S
   set the caseSensitive to pCase is true
   set the itemDel to D
   put length(D) into dLength
   put 1 - dLength into C
   repeat for each item i in S
  add length(i) + dLength to C
  put C,"" after R
   end repeat
   set the itemDel to comma
   if char -dLength to -1 of S is D then return char 1 to -2 of R
   put length(C) + 1 into lenC
   put length(R) into lenR
   if lenC = lenR then return 0
   return char 1 to lenR - lenC - 1 of R
end offsetList

On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
use-livecode@lists.runrev.com> wrote:

> Hi Geoff,
>
> thank you for this beautiful script.
>
> I modified it a bit to accept multi-character search string and also for
> case sensitivity.
>
> It definitely is a lot faster for unicode text than anything I have seen.
>
> -
> function offsetList D,S, pCase
>-- returns a comma-delimited list of the offsets of D in S
>-- pCase is a boolean for caseSensitive
>set the caseSensitive to pCase
>set the itemDel to D
>put the length of D into tDelimLength
>repeat for each item i in S
>   add length(i) + tDelimLength to C
>   put C - (tDelimLength - 1),"" after R
>end repeat
>set the itemDel to comma
>if char -1 of S is D then return char 1 to -2 of R
>put length(C) + 1 into lenC
>put length(R) into lenR
>if lenC = lenR then return 0
>return char 1 to lenR - lenC - 1 of R
> end offsetList
> --
>
> Kind regards
> Bernd
>
>
>
>
>
> >
> > Date: Thu, 1 Nov 2018 00:15:37 -0700
> > From: Geoff Canyon
> > To: How to use LiveCode 
> > Subject: Re: How to find the offset of the last instance of a
> >   repeating   character in a string?
> >
> > I was curious if using the itemDelimiter might work for this, so I wrote
> > the below code out of curiosity; but in my quick testing with single-byte
> > characters it was only about 30% faster than the above methods, so I
> didn't
> > bother to post it.
> >
> > But Ben Rubinstein just posted about a terrible slow-down doing pretty
> much
> > this same thing for text with unicode characters. So I ran a simple test
> > with 8000 character long strings that start with a single unicode
> > character, this is about 15x faster than offset() with skip. For
> > 100,000-character lines it's about 300x faster, so it seems to be immune
> to
> > the line-painter issues skip is subject to. So for what it's worth:
> >
> > function offsetList D,S
> >   -- returns a comma-delimited list of the offsets of D in S
> >   set the itemDel to D
> >   repeat for each item i in S
> >  add length(i) + 1 to C
> >  put C,"" after R
> >   end repeat
> >   set the itemDel to comma
> >   if char -1 of S is D then return char 1 to -2 of R
> >   put length(C) + 1 into lenC
> >   put length(R) into lenR
> >   if lenC = lenR then return 0
> >   return char 1 to lenR - lenC - 1 of R
> > end offsetList
> >
>
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

2018-11-01 Thread Niggemann, Bernd via use-livecode
Hi Geoff,

thank you for this beautiful script.

I modified it a bit to accept multi-character search string and also for case 
sensitivity.

It definitely is a lot faster for unicode text than anything I have seen.

-
function offsetList D,S, pCase
   -- returns a comma-delimited list of the offsets of D in S
   -- pCase is a boolean for caseSensitive
   set the caseSensitive to pCase
   set the itemDel to D
   put the length of D into tDelimLength
   repeat for each item i in S
  add length(i) + tDelimLength to C
  put C - (tDelimLength - 1),"" after R
   end repeat
   set the itemDel to comma
   if char -1 of S is D then return char 1 to -2 of R
   put length(C) + 1 into lenC
   put length(R) into lenR
   if lenC = lenR then return 0
   return char 1 to lenR - lenC - 1 of R
end offsetList
--

Kind regards
Bernd





> 
> Date: Thu, 1 Nov 2018 00:15:37 -0700
> From: Geoff Canyon
> To: How to use LiveCode 
> Subject: Re: How to find the offset of the last instance of a
>   repeating   character in a string?
> 
> I was curious if using the itemDelimiter might work for this, so I wrote
> the below code out of curiosity; but in my quick testing with single-byte
> characters it was only about 30% faster than the above methods, so I didn't
> bother to post it.
> 
> But Ben Rubinstein just posted about a terrible slow-down doing pretty much
> this same thing for text with unicode characters. So I ran a simple test
> with 8000 character long strings that start with a single unicode
> character, this is about 15x faster than offset() with skip. For
> 100,000-character lines it's about 300x faster, so it seems to be immune to
> the line-painter issues skip is subject to. So for what it's worth:
> 
> function offsetList D,S
>   -- returns a comma-delimited list of the offsets of D in S
>   set the itemDel to D
>   repeat for each item i in S
>  add length(i) + 1 to C
>  put C,"" after R
>   end repeat
>   set the itemDel to comma
>   if char -1 of S is D then return char 1 to -2 of R
>   put length(C) + 1 into lenC
>   put length(R) into lenR
>   if lenC = lenR then return 0
>   return char 1 to lenR - lenC - 1 of R
> end offsetList
> 


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode