Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
I've updated my GitHub to the following, which adopts Brian's "starts with" (I can't count how many times I've had to re-remember that "starts with" is faster than comparing to char 1 through ) and added minor optimizations to the wrapping-up code. gc function allOffsets D,S,pCase,pNoOverlaps -- returns a comma-delimited list of the offsets of D in S set the caseSensitive to pCase is true put length(D) into dLength put pNoOverlaps and dLength > 1 into pNoOverlaps put numtochar(chartonum(char -1 of D) mod 2 + 1) after S if not pNoOverlaps then repeat with i = 1 to dLength - 1 if not (char i + 1 to -1 of D is char 1 to dLength - i of D) then next repeat put char -i to -1 of D into OV[i] put i & cr after kList end repeat end if set the itemDel to D put 1 - dLength into C if pNoOverlaps or kList is empty then repeat for each item i in S add length(i) + dLength to C put C,"" after R end repeat else repeat for each item i in S repeat for each line K in kList if i & D begins with OV[K] then put (C + K),"" after R end repeat add length(i) + dLength to C put C,"" after R end repeat end if set the itemDel to comma repeat with i = 1 to 999 if item i of R > 0 then exit repeat end repeat delete item 1 to i - 1 of R if R begins with C then return 0 return char 1 to -3 - length(C) of R end allOffsets ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
On Sun, Nov 4, 2018 at 7:11 PM Mark Wieder via use-livecode < use-livecode@lists.runrev.com> wrote: > > If you're looking for 'romeo' in pText, would you set pOverlaps to true > or to false? I'd set it to false, there's no way for "romeo" to overlap. But even if I were looking for "radar", which could overlap, I'd set it to false if I were searching an english text document, because there's no word "radaradar". But as I said, I've switched it to default to finding overlaps. ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
On Sun, Nov 4, 2018 at 7:42 PM Bob Sneidar via use-livecode < use-livecode@lists.runrev.com> wrote: > Simply add 1 to the last offset pointer. If after the first iteration you > return 1, then set the charsToSkip to 2 instead of offset + > len(searchString) if you take my meaning. > > Bob S > The method we're using avoids charsToSkip because it suffers mightily with multi-byte characters. But the latest updates handle overlapping results, see other posts in this thread. ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Here's an image of the stack in my fork of the repo: https://github.com/bwmilby/alloffsets/blob/bwm/bwm/stack_allOffsets_card_id_1018.png On Sun, Nov 4, 2018 at 10:07 PM Brian Milby wrote: > I’m working on an update to the stack now. Moving buttons to the left side > to make it easier to add more. > > Thanks, > Brian > On Nov 4, 2018, 10:02 PM -0600, Mark Wieder via use-livecode < > use-livecode@lists.runrev.com>, wrote: > > On 11/4/18 4:45 PM, Brian Milby via use-livecode wrote: > > My updated solution always looks for overlap but if none are found it uses > optimized versions of the search (private functions instead of inside the > main function). I special case for no overlap and a single overlap in the > delimiter. It is about the same speed as Geoff’s. > > > Nice. I tried to get tricky and replace that 'replace with' loop with a > 'repeat for each' loop, but ended up about 20% slower. Not at all what I > expected. > > -- > Mark Wieder > ahsoftw...@gmail.com > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
I’m working on an update to the stack now. Moving buttons to the left side to make it easier to add more. Thanks, Brian On Nov 4, 2018, 10:02 PM -0600, Mark Wieder via use-livecode , wrote: > On 11/4/18 4:45 PM, Brian Milby via use-livecode wrote: > > My updated solution always looks for overlap but if none are found it uses > > optimized versions of the search (private functions instead of inside the > > main function). I special case for no overlap and a single overlap in the > > delimiter. It is about the same speed as Geoff’s. > > Nice. I tried to get tricky and replace that 'replace with' loop with a > 'repeat for each' loop, but ended up about 20% slower. Not at all what I > expected. > > -- > Mark Wieder > ahsoftw...@gmail.com > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
On 11/4/18 4:45 PM, Brian Milby via use-livecode wrote: My updated solution always looks for overlap but if none are found it uses optimized versions of the search (private functions instead of inside the main function). I special case for no overlap and a single overlap in the delimiter. It is about the same speed as Geoff’s. Nice. I tried to get tricky and replace that 'replace with' loop with a 'repeat for each' loop, but ended up about 20% slower. Not at all what I expected. -- Mark Wieder ahsoftw...@gmail.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Simply add 1 to the last offset pointer. If after the first iteration you return 1, then set the charsToSkip to 2 instead of offset + len(searchString) if you take my meaning. Bob S > On Nov 2, 2018, at 17:43 , Geoff Canyon via use-livecode > wrote: > > I like that, changing it. Now available at > https://github.com/gcanyon/alloffsets > > One thing I don't see how to do without significantly impacting performance > is to return all offsets if there are overlapping strings. For example: > > allOffsets("aba","abababa") > > would return 1,5, when it might be reasonable to expect it to return 1,3,5. > Using the offset function with numToSkip would make that easy; adapting > allOffsets to do so would be harder to do cleanly I think. > > gc > > On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode < > use-livecode@lists.runrev.com> wrote: > >> how about allOffsets? >> >> Bob S >> >> >>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode < >> use-livecode@lists.runrev.com> wrote: >>> >>> All of those return a single value; I wanted to convey the concept of >>> returning multiple values. To me listOffset implies it does the same >> thing >>> as itemOffset, since items come in a list. How about: >>> >>> offsets -- not my favorite because it's almost indistinguishable from >> offset >>> offsetsOf -- seems a tad clumsy >>> >>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode < >>> use-livecode@lists.runrev.com> wrote: >>> It probably should be named listOffset, like itemOffset or lineOffset. Bob S > On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode < use-livecode@lists.runrev.com> wrote: > > Nice! I *just* finished creating a github repository for it, and adding > support for multi-char search strings, much as you did. I was coming to the > list to post the update when I saw your post. > > Here's the GitHub link: https://github.com/gcanyon/offsetlist > > Here's my updated version: > > function offsetList D,S,pCase > -- returns a comma-delimited list of the offsets of D in S > set the caseSensitive to pCase is true > set the itemDel to D > put length(D) into dLength > put 1 - dLength into C > repeat for each item i in S >add length(i) + dLength to C >put C,"" after R > end repeat > set the itemDel to comma > if char -dLength to -1 of S is D then return char 1 to -2 of R > put length(C) + 1 into lenC > put length(R) into lenR > if lenC = lenR then return 0 > return char 1 to lenR - lenC - 1 of R > end offsetList > > On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < > use-livecode@lists.runrev.com> wrote: > >> Hi Geoff, >> >> thank you for this beautiful script. >> >> I modified it a bit to accept multi-character search string and also >> for >> case sensitivity. >> >> It definitely is a lot faster for unicode text than anything I have seen. >> >> - >> function offsetList D,S, pCase >> -- returns a comma-delimited list of the offsets of D in S >> -- pCase is a boolean for caseSensitive >> set the caseSensitive to pCase >> set the itemDel to D >> put the length of D into tDelimLength >> repeat for each item i in S >>add length(i) + tDelimLength to C >>put C - (tDelimLength - 1),"" after R >> end repeat >> set the itemDel to comma >> if char -1 of S is D then return char 1 to -2 of R >> put length(C) + 1 into lenC >> put length(R) into lenR >> if lenC = lenR then return 0 >> return char 1 to lenR - lenC - 1 of R >> end offsetList >> -- >> >> Kind regards >> Bernd >> >> >> >> >> >>> >>> Date: Thu, 1 Nov 2018 00:15:37 -0700 >>> From: Geoff Canyon >>> To: How to use LiveCode >>> Subject: Re: How to find the offset of the last instance of a >>>repeating character in a string? >>> >>> I was curious if using the itemDelimiter might work for this, so I wrote >>> the below code out of curiosity; but in my quick testing with single-byte >>> characters it was only about 30% faster than the above methods, so I >> didn't >>> bother to post it. >>> >>> But Ben Rubinstein just posted about a terrible slow-down doing >> pretty >> much >>> this same thing for text with unicode characters. So I ran a simple test >>> with 8000 character long strings that start with a single unicode >>> character, this is about 15x faster than offset() with skip. For >>> 100,000-character lines it's about 300x faster, so it seems to be immune >> to >>> the line-painter issues skip is subject to. So for what it's worth: >>> >>> function offsetList D,S >>> -- returns a c
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
On 11/4/18 6:49 PM, Geoff Canyon via use-livecode wrote: I'm not sure I agree that it would be so unlikely to know that overlaps won't occur (or that it's unreasonable to not want them). If I'm looking for every instance of "romeo" in romeo and juliet, then obviously I'm not expecting, nor do I want, overlaps. Sure, but in that case you'd be better off using the faster 'offset' function. Or do you mean every instance of 'romeo' in the play itself? There I can see why you'd want to set it to false for speed. My point isn't really whether pOverlaps should default to true or false, but that you need detailed knowledge of the corpus of data before calling the function. If you're looking for 'romeo' in pText, would you set pOverlaps to true or to false? -- Mark Wieder ahsoftw...@gmail.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
On Sun, Nov 4, 2018 at 4:34 PM Mark Wieder via use-livecode < use-livecode@lists.runrev.com> wrote: > On 11/4/18 10:40 AM, Geoff Canyon via use-livecode wrote: > > I also added a "with overlaps" option. > > My problem with the pWithOverlaps parameter is that is requires a priori > knowledge of the data being consumed. If you already know there are > overlaps then you'd set the parameter to true. If you don't know whether > or not there are overlaps, then you'd need to set it to true so you > don't miss anything (aside, of course, for the trivial case where you > don't care whether or not there are overlaps - is there a use case for > this?). > > The only time you would set it to false is after you've already > determined that there are no overlaps, and the time spent on that would > probably more than offset the extra processing in the function. I'm not sure I agree that it would be so unlikely to know that overlaps won't occur (or that it's unreasonable to not want them). If I'm looking for every instance of "romeo" in romeo and juliet, then obviously I'm not expecting, nor do I want, overlaps. Likewise, overlaps can only occur if the search string allows for them, so "romeo" makes it impossible from the get go That said, it seems reasonable to default overlaps to true rather than false. I'll set it up that way when I add the modification below. On Sun, Nov 4, 2018 at 4:02 PM Brian Milby via use-livecode < use-livecode@lists.runrev.com> wrote: > > put kList is not empty into pWithOverlaps > Good point -- I suppose it also makes sense (albeit that the speed improvement would be trivial) to not bother even building kList if the term to be found is a single character. gc ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
My updated solution always looks for overlap but if none are found it uses optimized versions of the search (private functions instead of inside the main function). I special case for no overlap and a single overlap in the delimiter. It is about the same speed as Geoff’s. Thanks, Brian On Nov 4, 2018, 6:34 PM -0600, Mark Wieder via use-livecode , wrote: > On 11/4/18 10:40 AM, Geoff Canyon via use-livecode wrote: > > I also added a "with overlaps" option. > > My problem with the pWithOverlaps parameter is that is requires a priori > knowledge of the data being consumed. If you already know there are > overlaps then you'd set the parameter to true. If you don't know whether > or not there are overlaps, then you'd need to set it to true so you > don't miss anything (aside, of course, for the trivial case where you > don't care whether or not there are overlaps - is there a use case for > this?). > > The only time you would set it to false is after you've already > determined that there are no overlaps, and the time spent on that would > probably more than offset the extra processing in the function. > > -- > Mark Wieder > ahsoftw...@gmail.com > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
On 11/4/18 10:40 AM, Geoff Canyon via use-livecode wrote: I also added a "with overlaps" option. My problem with the pWithOverlaps parameter is that is requires a priori knowledge of the data being consumed. If you already know there are overlaps then you'd set the parameter to true. If you don't know whether or not there are overlaps, then you'd need to set it to true so you don't miss anything (aside, of course, for the trivial case where you don't care whether or not there are overlaps - is there a use case for this?). The only time you would set it to false is after you've already determined that there are no overlaps, and the time spent on that would probably more than offset the extra processing in the function. -- Mark Wieder ahsoftw...@gmail.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Logic matches my solution. I also validated my solution using just the offset function. Speed hit for with overlap is similar. One possible optimization: put kList is not empty into pWithOverlaps If with overlaps was requested but the source delimiter did not contain any overlaps, then the extra loops are skipped. Adding a character to the end is clever. I'll need to incorporate that and see what it does to my method. My take on the code updates is here: https://github.com/bwmilby/alloffsets/blob/bwm/bwm/allOffsets_Scripts/stack_allOffsets_button_id_1026.livecodescript Stack and index of scripts here: https://github.com/bwmilby/alloffsets/tree/bwm/bwm On Sun, Nov 4, 2018 at 12:42 PM Geoff Canyon via use-livecode < use-livecode@lists.runrev.com> wrote: > Alex, good catch! The code below and at > https://github.com/gcanyon/alloffsets now puts a stop character after the > string to prevent the error you found. I also added a "with overlaps" > option. I think this is correct, and about as efficient as possible, but > thanks to anyone who finds a bug or a faster way. > > gc > > > function allOffsets D,S,pCase,pWithOverlaps >-- returns a comma-delimited list of the offsets of D in S >set the caseSensitive to pCase is true >put length(D) into dLength >put numtochar(chartonum(char -1 of D) mod 2 + 1) after S >if pWithOverlaps then > repeat with i = 1 to dLength - 1 > if not (char i + 1 to -1 of D is char 1 to dLength - i of D) then > next repeat > put char -i to -1 of D into OV[i] > put i & cr after kList > end repeat >end if >set the itemDel to D >put 1 - dLength into C >if pWithOverlaps then > repeat for each item i in S > repeat for each line K in kList > if char 1 to K of (i & D) is OV[K] then put (C + K),"" after R > end repeat > add length(i) + dLength to C > put C,"" after R > end repeat >else > repeat for each item i in S > add length(i) + dLength to C > put C,"" after R > end repeat >end if >set the itemDel to comma >repeat until item 1 of R > 0 > delete item 1 of R >end repeat >delete item -1 of R >if R is empty then return 0 else return char 1 to -2 of R > end allOffsets > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Alex, good catch! The code below and at https://github.com/gcanyon/alloffsets now puts a stop character after the string to prevent the error you found. I also added a "with overlaps" option. I think this is correct, and about as efficient as possible, but thanks to anyone who finds a bug or a faster way. gc function allOffsets D,S,pCase,pWithOverlaps -- returns a comma-delimited list of the offsets of D in S set the caseSensitive to pCase is true put length(D) into dLength put numtochar(chartonum(char -1 of D) mod 2 + 1) after S if pWithOverlaps then repeat with i = 1 to dLength - 1 if not (char i + 1 to -1 of D is char 1 to dLength - i of D) then next repeat put char -i to -1 of D into OV[i] put i & cr after kList end repeat end if set the itemDel to D put 1 - dLength into C if pWithOverlaps then repeat for each item i in S repeat for each line K in kList if char 1 to K of (i & D) is OV[K] then put (C + K),"" after R end repeat add length(i) + dLength to C put C,"" after R end repeat else repeat for each item i in S add length(i) + dLength to C put C,"" after R end repeat end if set the itemDel to comma repeat until item 1 of R > 0 delete item 1 of R end repeat delete item -1 of R if R is empty then return 0 else return char 1 to -2 of R end allOffsets ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
I've posted a binary stack version that includes my version. I cloned and made a "bwm" branch in my clone. Here's the direct link to the script with the posted code (updated to use private functions): https://github.com/bwmilby/alloffsets/blob/bwm/bwm/allOffsets_Scripts/stack_allOffsets_button_id_1009.livecodescript The binary stack can be found here: https://github.com/bwmilby/alloffsets/tree/bwm/bwm There are 3 button across the top. The first is Geoff's version. The second is my combined version. The third is the one with private functions added. The first button replaces the results field. The second and third add their results to the results field. The top field is the string to find (needle), the second is the string to search (haystack), the third is for the results. Everything is in a background group so you can add cards for unique searches. On Sat, Nov 3, 2018 at 9:17 AM Brian Milby wrote: > Good catch Alex. My code was closer, but didn't handle repeating > characters correctly. Here is an updated version. > > function allOffsets2 D,S,pCase >local dLength, C, R >-- returns a comma-delimited list of the offsets of D in S >set the caseSensitive to pCase is true >set the itemDel to D >put length(D) into dLength >put 1 - dLength into C > >if dLength > 1 then > local n, i, j, D2, L2 > put 0 into n > repeat with i = 2 to dLength > if char i to -1 of D is char 1 to -i of D then > add 1 to n > put char (1-i) to -1 of D into D2[n] > put i-1 into L2[n] > end if > end repeat >end if > >repeat for each item i in S > if C > 0 and n > 0 then > repeat with j = 1 to n > if i&D begins with D2[j] then >put C+L2[j],"" after R > end if > end repeat > end if > add length(i) + dLength to C > put C,"" after R >end repeat >set the itemDel to comma >delete char -1 of R > >if item -1 of R > len(S) then > if the number of items of R is 1 then > return 0 > else > delete item -1 of R > end if >end if > >if len(i) > 0 then > repeat with j = n down to len(i)+1 > if char -len(D2[j]) to -1 of S is D2[j] then > delete item -1 of R > end if > end repeat >end if >return R > end allOffsets2 > > > On Sat, Nov 3, 2018 at 8:33 AM Alex Tweedly via use-livecode < > use-livecode@lists.runrev.com> wrote: > >> Hi Geoff, >> >> unfortunately the impact of overlapping delimiter strings is more severe >> than simply not finding them. The code on github gets the wrong answer >> if there is an overlapping string at the very end of the search string, >> e.g. >> >> alloffsets("", "a")wrongly gives 1,5,10 >> >> I suspect the test for >> >> if char -dLength to -1 of S is D then return char 1 to -2 of R >> should be (something like) >>if item -1 of S is empty then return char 1 to -2 of R >> but to be honest, I'm not 10% certain of that. >> >> Alex. >> >> >> >> On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote: >> > I like that, changing it. Now available at >> > https://github.com/gcanyon/alloffsets >> > >> > One thing I don't see how to do without significantly impacting >> performance >> > is to return all offsets if there are overlapping strings. For example: >> > >> > allOffsets("aba","abababa") >> > >> > would return 1,5, when it might be reasonable to expect it to return >> 1,3,5. >> > Using the offset function with numToSkip would make that easy; adapting >> > allOffsets to do so would be harder to do cleanly I think. >> > >> > gc >> > >> > On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode < >> > use-livecode@lists.runrev.com> wrote: >> > >> >> how about allOffsets? >> >> >> >> Bob S >> >> >> >> >> >>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode < >> >> use-livecode@lists.runrev.com> wrote: >> >>> All of those return a single value; I wanted to convey the concept of >> >>> returning multiple values. To me listOffset implies it does the same >> >> thing >> >>> as itemOffset, since items come in a list. How about: >> >>> >> >>> offsets -- not my favorite because it's almost indistinguishable from >> >> offset >> >>> offsetsOf -- seems a tad clumsy >> >>> >> >>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode < >> >>> use-livecode@lists.runrev.com> wrote: >> >>> >> It probably should be named listOffset, like itemOffset or >> lineOffset. >> >> Bob S >> >> >> > On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode < >> use-livecode@lists.runrev.com> wrote: >> > Nice! I *just* finished creating a github repository for it, and >> adding >> > support for multi-char search strings, much as you did. I was >> coming to >> the >> > list to post the update when I saw your post. >> > >> > Here's the GitHub link: h
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Good catch Alex. My code was closer, but didn't handle repeating characters correctly. Here is an updated version. function allOffsets2 D,S,pCase local dLength, C, R -- returns a comma-delimited list of the offsets of D in S set the caseSensitive to pCase is true set the itemDel to D put length(D) into dLength put 1 - dLength into C if dLength > 1 then local n, i, j, D2, L2 put 0 into n repeat with i = 2 to dLength if char i to -1 of D is char 1 to -i of D then add 1 to n put char (1-i) to -1 of D into D2[n] put i-1 into L2[n] end if end repeat end if repeat for each item i in S if C > 0 and n > 0 then repeat with j = 1 to n if i&D begins with D2[j] then put C+L2[j],"" after R end if end repeat end if add length(i) + dLength to C put C,"" after R end repeat set the itemDel to comma delete char -1 of R if item -1 of R > len(S) then if the number of items of R is 1 then return 0 else delete item -1 of R end if end if if len(i) > 0 then repeat with j = n down to len(i)+1 if char -len(D2[j]) to -1 of S is D2[j] then delete item -1 of R end if end repeat end if return R end allOffsets2 On Sat, Nov 3, 2018 at 8:33 AM Alex Tweedly via use-livecode < use-livecode@lists.runrev.com> wrote: > Hi Geoff, > > unfortunately the impact of overlapping delimiter strings is more severe > than simply not finding them. The code on github gets the wrong answer > if there is an overlapping string at the very end of the search string, > e.g. > > alloffsets("", "a")wrongly gives 1,5,10 > > I suspect the test for > > if char -dLength to -1 of S is D then return char 1 to -2 of R > should be (something like) >if item -1 of S is empty then return char 1 to -2 of R > but to be honest, I'm not 10% certain of that. > > Alex. > > > > On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote: > > I like that, changing it. Now available at > > https://github.com/gcanyon/alloffsets > > > > One thing I don't see how to do without significantly impacting > performance > > is to return all offsets if there are overlapping strings. For example: > > > > allOffsets("aba","abababa") > > > > would return 1,5, when it might be reasonable to expect it to return > 1,3,5. > > Using the offset function with numToSkip would make that easy; adapting > > allOffsets to do so would be harder to do cleanly I think. > > > > gc > > > > On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode < > > use-livecode@lists.runrev.com> wrote: > > > >> how about allOffsets? > >> > >> Bob S > >> > >> > >>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode < > >> use-livecode@lists.runrev.com> wrote: > >>> All of those return a single value; I wanted to convey the concept of > >>> returning multiple values. To me listOffset implies it does the same > >> thing > >>> as itemOffset, since items come in a list. How about: > >>> > >>> offsets -- not my favorite because it's almost indistinguishable from > >> offset > >>> offsetsOf -- seems a tad clumsy > >>> > >>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode < > >>> use-livecode@lists.runrev.com> wrote: > >>> > It probably should be named listOffset, like itemOffset or lineOffset. > > Bob S > > > > On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode < > use-livecode@lists.runrev.com> wrote: > > Nice! I *just* finished creating a github repository for it, and > adding > > support for multi-char search strings, much as you did. I was coming > to > the > > list to post the update when I saw your post. > > > > Here's the GitHub link: https://github.com/gcanyon/offsetlist > > > > Here's my updated version: > > > > function offsetList D,S,pCase > > -- returns a comma-delimited list of the offsets of D in S > > set the caseSensitive to pCase is true > > set the itemDel to D > > put length(D) into dLength > > put 1 - dLength into C > > repeat for each item i in S > > add length(i) + dLength to C > > put C,"" after R > > end repeat > > set the itemDel to comma > > if char -dLength to -1 of S is D then return char 1 to -2 of R > > put length(C) + 1 into lenC > > put length(R) into lenR > > if lenC = lenR then return 0 > > return char 1 to lenR - lenC - 1 of R > > end offsetList > > > > On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < > > use-livecode@lists.runrev.com> wrote: > > > >> Hi Geoff, > >> > >> thank you for this beautiful script. > >> > >> I modified it a bit to accept multi-character search string and also > >> for > >> case sensitivity. > >> >
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Hi Geoff, unfortunately the impact of overlapping delimiter strings is more severe than simply not finding them. The code on github gets the wrong answer if there is an overlapping string at the very end of the search string, e.g. alloffsets("", "a") wrongly gives 1,5,10 I suspect the test for if char -dLength to -1 of S is D then return char 1 to -2 of R should be (something like) if item -1 of S is empty then return char 1 to -2 of R but to be honest, I'm not 10% certain of that. Alex. On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote: I like that, changing it. Now available at https://github.com/gcanyon/alloffsets One thing I don't see how to do without significantly impacting performance is to return all offsets if there are overlapping strings. For example: allOffsets("aba","abababa") would return 1,5, when it might be reasonable to expect it to return 1,3,5. Using the offset function with numToSkip would make that easy; adapting allOffsets to do so would be harder to do cleanly I think. gc On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode < use-livecode@lists.runrev.com> wrote: how about allOffsets? Bob S On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode < use-livecode@lists.runrev.com> wrote: All of those return a single value; I wanted to convey the concept of returning multiple values. To me listOffset implies it does the same thing as itemOffset, since items come in a list. How about: offsets -- not my favorite because it's almost indistinguishable from offset offsetsOf -- seems a tad clumsy On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode < use-livecode@lists.runrev.com> wrote: It probably should be named listOffset, like itemOffset or lineOffset. Bob S On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode < use-livecode@lists.runrev.com> wrote: Nice! I *just* finished creating a github repository for it, and adding support for multi-char search strings, much as you did. I was coming to the list to post the update when I saw your post. Here's the GitHub link: https://github.com/gcanyon/offsetlist Here's my updated version: function offsetList D,S,pCase -- returns a comma-delimited list of the offsets of D in S set the caseSensitive to pCase is true set the itemDel to D put length(D) into dLength put 1 - dLength into C repeat for each item i in S add length(i) + dLength to C put C,"" after R end repeat set the itemDel to comma if char -dLength to -1 of S is D then return char 1 to -2 of R put length(C) + 1 into lenC put length(R) into lenR if lenC = lenR then return 0 return char 1 to lenR - lenC - 1 of R end offsetList On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < use-livecode@lists.runrev.com> wrote: Hi Geoff, thank you for this beautiful script. I modified it a bit to accept multi-character search string and also for case sensitivity. It definitely is a lot faster for unicode text than anything I have seen. - function offsetList D,S, pCase -- returns a comma-delimited list of the offsets of D in S -- pCase is a boolean for caseSensitive set the caseSensitive to pCase set the itemDel to D put the length of D into tDelimLength repeat for each item i in S add length(i) + tDelimLength to C put C - (tDelimLength - 1),"" after R end repeat set the itemDel to comma if char -1 of S is D then return char 1 to -2 of R put length(C) + 1 into lenC put length(R) into lenR if lenC = lenR then return 0 return char 1 to lenR - lenC - 1 of R end offsetList -- Kind regards Bernd Date: Thu, 1 Nov 2018 00:15:37 -0700 From: Geoff Canyon To: How to use LiveCode Subject: Re: How to find the offset of the last instance of a repeating character in a string? I was curious if using the itemDelimiter might work for this, so I wrote the below code out of curiosity; but in my quick testing with single-byte characters it was only about 30% faster than the above methods, so I didn't bother to post it. But Ben Rubinstein just posted about a terrible slow-down doing pretty much this same thing for text with unicode characters. So I ran a simple test with 8000 character long strings that start with a single unicode character, this is about 15x faster than offset() with skip. For 100,000-character lines it's about 300x faster, so it seems to be immune to the line-painter issues skip is subject to. So for what it's worth: function offsetList D,S -- returns a comma-delimited list of the offsets of D in S set the itemDel to D repeat for each item i in S add length(i) + 1 to C put C,"" after R end repeat set the itemDel to comma if char -1 of S is D then return char 1 to -2 of R put length(C) + 1 into lenC put length(R) into lenR if lenC = lenR then return 0 return char 1 to lenR - lenC - 1 of R end offsetList ___
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Here is something... probably needs some optimization function allOffsets2 D,S,pCase local dLength, C, R -- returns a comma-delimited list of the offsets of D in S set the caseSensitive to pCase is true set the itemDel to D put length(D) into dLength put 1 - dLength into C if dLength > 1 then local n, i, j, D2, L2 put 0 into n repeat with i = 2 to dLength if char i to -1 of D is char 1 to -i of D then add 1 to n put char (1-i) to -1 of D into D2[n] put i-1 into L2[n] end if end repeat end if repeat for each item i in S if C > 0 and n > 0 then repeat with j = 1 to n if i&D begins with D2[j] then put C+L2[j],"" after R end if end repeat end if add length(i) + dLength to C put C,"" after R end repeat set the itemDel to comma delete char -1 of R if item -1 of R > len(S) then if the number of items of R is 1 then return 0 else delete item -1 of R end if end if if char -dLength to -1 of S is D then return R end if repeat with j = n down to 1 if char -len(D2[j]) to -1 of S is D2[j] then delete item -1 of R end if end repeat return R end allOffsets2 I think a couple of private functions would be good. One for 0 overlap, one for a single overlap, then a final general one for any number of overlaps (the core of the above). After the loop that generates D2/L2 I would branch based on n to avoid the additional comparisons inside the loop. On Fri, Nov 2, 2018 at 9:45 PM Alex Tweedly via use-livecode < use-livecode@lists.runrev.com> wrote: > Oh dear - answering my own posts rarely a good sign :-) > > > On 03/11/2018 02:10, Alex Tweedly via use-livecode wrote: > > > > On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote: > >> One thing I don't see how to do without significantly impacting > >> performance > >> is to return all offsets if there are overlapping strings. For example: > >> > >> allOffsets("aba","abababa") > >> > >> would return 1,5, when it might be reasonable to expect it to return > >> 1,3,5. > >> Using the offset function with numToSkip would make that easy; adapting > >> allOffsets to do so would be harder to do cleanly I think. > >> > > Can I suggest changing it to "someOffsets()" :-) :-) > > > > But seriously, can you not iteratively run "allofsets" ? > > > Answer : NO. That doesn't work. > However, there is a more efficient way that does work - but it needs to > be tested before I post it. > > -- Alex. > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Oh dear - answering my own posts rarely a good sign :-) On 03/11/2018 02:10, Alex Tweedly via use-livecode wrote: On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote: One thing I don't see how to do without significantly impacting performance is to return all offsets if there are overlapping strings. For example: allOffsets("aba","abababa") would return 1,5, when it might be reasonable to expect it to return 1,3,5. Using the offset function with numToSkip would make that easy; adapting allOffsets to do so would be harder to do cleanly I think. Can I suggest changing it to "someOffsets()" :-) :-) But seriously, can you not iteratively run "allofsets" ? Answer : NO. That doesn't work. However, there is a more efficient way that does work - but it needs to be tested before I post it. -- Alex. ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote: I like that, changing it. Now available at https://github.com/gcanyon/alloffsets One thing I don't see how to do without significantly impacting performance is to return all offsets if there are overlapping strings. For example: allOffsets("aba","abababa") would return 1,5, when it might be reasonable to expect it to return 1,3,5. Using the offset function with numToSkip would make that easy; adapting allOffsets to do so would be harder to do cleanly I think. Can I suggest changing it to "someOffsets()" :-) :-) But seriously, can you not iteratively run "allofsets" ? something like (typed straight into email - totally untested) function allOffsets pDel, pStr repeat with c = 1 to 255 -- or some other upper limit ? if NOT pDel contains numtochar(c) then put numtochar(c) into c exit repeat end if end repeat repeat forever put someOffsets(pDel, pStr) into newR if the number of items in newR = 0 then exit repeat repeat for each item I in newR put c into char I of newR end repeat put newR after R end repeat sort items of R numeric return R end alloffsets -- Alex. ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
I like that, changing it. Now available at https://github.com/gcanyon/alloffsets One thing I don't see how to do without significantly impacting performance is to return all offsets if there are overlapping strings. For example: allOffsets("aba","abababa") would return 1,5, when it might be reasonable to expect it to return 1,3,5. Using the offset function with numToSkip would make that easy; adapting allOffsets to do so would be harder to do cleanly I think. gc On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode < use-livecode@lists.runrev.com> wrote: > how about allOffsets? > > Bob S > > > > On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode < > use-livecode@lists.runrev.com> wrote: > > > > All of those return a single value; I wanted to convey the concept of > > returning multiple values. To me listOffset implies it does the same > thing > > as itemOffset, since items come in a list. How about: > > > > offsets -- not my favorite because it's almost indistinguishable from > offset > > offsetsOf -- seems a tad clumsy > > > > On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode < > > use-livecode@lists.runrev.com> wrote: > > > >> It probably should be named listOffset, like itemOffset or lineOffset. > >> > >> Bob S > >> > >> > >>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode < > >> use-livecode@lists.runrev.com> wrote: > >>> > >>> Nice! I *just* finished creating a github repository for it, and adding > >>> support for multi-char search strings, much as you did. I was coming to > >> the > >>> list to post the update when I saw your post. > >>> > >>> Here's the GitHub link: https://github.com/gcanyon/offsetlist > >>> > >>> Here's my updated version: > >>> > >>> function offsetList D,S,pCase > >>> -- returns a comma-delimited list of the offsets of D in S > >>> set the caseSensitive to pCase is true > >>> set the itemDel to D > >>> put length(D) into dLength > >>> put 1 - dLength into C > >>> repeat for each item i in S > >>> add length(i) + dLength to C > >>> put C,"" after R > >>> end repeat > >>> set the itemDel to comma > >>> if char -dLength to -1 of S is D then return char 1 to -2 of R > >>> put length(C) + 1 into lenC > >>> put length(R) into lenR > >>> if lenC = lenR then return 0 > >>> return char 1 to lenR - lenC - 1 of R > >>> end offsetList > >>> > >>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < > >>> use-livecode@lists.runrev.com> wrote: > >>> > Hi Geoff, > > thank you for this beautiful script. > > I modified it a bit to accept multi-character search string and also > for > case sensitivity. > > It definitely is a lot faster for unicode text than anything I have > >> seen. > > - > function offsetList D,S, pCase > -- returns a comma-delimited list of the offsets of D in S > -- pCase is a boolean for caseSensitive > set the caseSensitive to pCase > set the itemDel to D > put the length of D into tDelimLength > repeat for each item i in S > add length(i) + tDelimLength to C > put C - (tDelimLength - 1),"" after R > end repeat > set the itemDel to comma > if char -1 of S is D then return char 1 to -2 of R > put length(C) + 1 into lenC > put length(R) into lenR > if lenC = lenR then return 0 > return char 1 to lenR - lenC - 1 of R > end offsetList > -- > > Kind regards > Bernd > > > > > > > > > Date: Thu, 1 Nov 2018 00:15:37 -0700 > > From: Geoff Canyon > > To: How to use LiveCode > > Subject: Re: How to find the offset of the last instance of a > > repeating character in a string? > > > > I was curious if using the itemDelimiter might work for this, so I > >> wrote > > the below code out of curiosity; but in my quick testing with > >> single-byte > > characters it was only about 30% faster than the above methods, so I > didn't > > bother to post it. > > > > But Ben Rubinstein just posted about a terrible slow-down doing > pretty > much > > this same thing for text with unicode characters. So I ran a simple > >> test > > with 8000 character long strings that start with a single unicode > > character, this is about 15x faster than offset() with skip. For > > 100,000-character lines it's about 300x faster, so it seems to be > >> immune > to > > the line-painter issues skip is subject to. So for what it's worth: > > > > function offsetList D,S > > -- returns a comma-delimited list of the offsets of D in S > > set the itemDel to D > > repeat for each item i in S > >add length(i) + 1 to C > >put C,"" after R > > end repeat > > set the itemDel to comma > > if char -1 of S is D then return char 1 to -2 of R > > put length(C)
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
how about allOffsets? Bob S > On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode > wrote: > > All of those return a single value; I wanted to convey the concept of > returning multiple values. To me listOffset implies it does the same thing > as itemOffset, since items come in a list. How about: > > offsets -- not my favorite because it's almost indistinguishable from offset > offsetsOf -- seems a tad clumsy > > On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode < > use-livecode@lists.runrev.com> wrote: > >> It probably should be named listOffset, like itemOffset or lineOffset. >> >> Bob S >> >> >>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode < >> use-livecode@lists.runrev.com> wrote: >>> >>> Nice! I *just* finished creating a github repository for it, and adding >>> support for multi-char search strings, much as you did. I was coming to >> the >>> list to post the update when I saw your post. >>> >>> Here's the GitHub link: https://github.com/gcanyon/offsetlist >>> >>> Here's my updated version: >>> >>> function offsetList D,S,pCase >>> -- returns a comma-delimited list of the offsets of D in S >>> set the caseSensitive to pCase is true >>> set the itemDel to D >>> put length(D) into dLength >>> put 1 - dLength into C >>> repeat for each item i in S >>> add length(i) + dLength to C >>> put C,"" after R >>> end repeat >>> set the itemDel to comma >>> if char -dLength to -1 of S is D then return char 1 to -2 of R >>> put length(C) + 1 into lenC >>> put length(R) into lenR >>> if lenC = lenR then return 0 >>> return char 1 to lenR - lenC - 1 of R >>> end offsetList >>> >>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < >>> use-livecode@lists.runrev.com> wrote: >>> Hi Geoff, thank you for this beautiful script. I modified it a bit to accept multi-character search string and also for case sensitivity. It definitely is a lot faster for unicode text than anything I have >> seen. - function offsetList D,S, pCase -- returns a comma-delimited list of the offsets of D in S -- pCase is a boolean for caseSensitive set the caseSensitive to pCase set the itemDel to D put the length of D into tDelimLength repeat for each item i in S add length(i) + tDelimLength to C put C - (tDelimLength - 1),"" after R end repeat set the itemDel to comma if char -1 of S is D then return char 1 to -2 of R put length(C) + 1 into lenC put length(R) into lenR if lenC = lenR then return 0 return char 1 to lenR - lenC - 1 of R end offsetList -- Kind regards Bernd > > Date: Thu, 1 Nov 2018 00:15:37 -0700 > From: Geoff Canyon > To: How to use LiveCode > Subject: Re: How to find the offset of the last instance of a > repeating character in a string? > > I was curious if using the itemDelimiter might work for this, so I >> wrote > the below code out of curiosity; but in my quick testing with >> single-byte > characters it was only about 30% faster than the above methods, so I didn't > bother to post it. > > But Ben Rubinstein just posted about a terrible slow-down doing pretty much > this same thing for text with unicode characters. So I ran a simple >> test > with 8000 character long strings that start with a single unicode > character, this is about 15x faster than offset() with skip. For > 100,000-character lines it's about 300x faster, so it seems to be >> immune to > the line-painter issues skip is subject to. So for what it's worth: > > function offsetList D,S > -- returns a comma-delimited list of the offsets of D in S > set the itemDel to D > repeat for each item i in S >add length(i) + 1 to C >put C,"" after R > end repeat > set the itemDel to comma > if char -1 of S is D then return char 1 to -2 of R > put length(C) + 1 into lenC > put length(R) into lenR > if lenC = lenR then return 0 > return char 1 to lenR - lenC - 1 of R > end offsetList > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode >>> ___ >>> use-livecode mailing list >>> use-livecode@lists.runrev.com >>> Please visit this url to subscribe, unsubscribe and manage your >> subscription preferences: >>> http://lists.runrev.com/mailman/listinfo/use-livecode >> >> >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
All of those return a single value; I wanted to convey the concept of returning multiple values. To me listOffset implies it does the same thing as itemOffset, since items come in a list. How about: offsets -- not my favorite because it's almost indistinguishable from offset offsetsOf -- seems a tad clumsy On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode < use-livecode@lists.runrev.com> wrote: > It probably should be named listOffset, like itemOffset or lineOffset. > > Bob S > > > > On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode < > use-livecode@lists.runrev.com> wrote: > > > > Nice! I *just* finished creating a github repository for it, and adding > > support for multi-char search strings, much as you did. I was coming to > the > > list to post the update when I saw your post. > > > > Here's the GitHub link: https://github.com/gcanyon/offsetlist > > > > Here's my updated version: > > > > function offsetList D,S,pCase > > -- returns a comma-delimited list of the offsets of D in S > > set the caseSensitive to pCase is true > > set the itemDel to D > > put length(D) into dLength > > put 1 - dLength into C > > repeat for each item i in S > > add length(i) + dLength to C > > put C,"" after R > > end repeat > > set the itemDel to comma > > if char -dLength to -1 of S is D then return char 1 to -2 of R > > put length(C) + 1 into lenC > > put length(R) into lenR > > if lenC = lenR then return 0 > > return char 1 to lenR - lenC - 1 of R > > end offsetList > > > > On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < > > use-livecode@lists.runrev.com> wrote: > > > >> Hi Geoff, > >> > >> thank you for this beautiful script. > >> > >> I modified it a bit to accept multi-character search string and also for > >> case sensitivity. > >> > >> It definitely is a lot faster for unicode text than anything I have > seen. > >> > >> - > >> function offsetList D,S, pCase > >> -- returns a comma-delimited list of the offsets of D in S > >> -- pCase is a boolean for caseSensitive > >> set the caseSensitive to pCase > >> set the itemDel to D > >> put the length of D into tDelimLength > >> repeat for each item i in S > >> add length(i) + tDelimLength to C > >> put C - (tDelimLength - 1),"" after R > >> end repeat > >> set the itemDel to comma > >> if char -1 of S is D then return char 1 to -2 of R > >> put length(C) + 1 into lenC > >> put length(R) into lenR > >> if lenC = lenR then return 0 > >> return char 1 to lenR - lenC - 1 of R > >> end offsetList > >> -- > >> > >> Kind regards > >> Bernd > >> > >> > >> > >> > >> > >>> > >>> Date: Thu, 1 Nov 2018 00:15:37 -0700 > >>> From: Geoff Canyon > >>> To: How to use LiveCode > >>> Subject: Re: How to find the offset of the last instance of a > >>> repeating character in a string? > >>> > >>> I was curious if using the itemDelimiter might work for this, so I > wrote > >>> the below code out of curiosity; but in my quick testing with > single-byte > >>> characters it was only about 30% faster than the above methods, so I > >> didn't > >>> bother to post it. > >>> > >>> But Ben Rubinstein just posted about a terrible slow-down doing pretty > >> much > >>> this same thing for text with unicode characters. So I ran a simple > test > >>> with 8000 character long strings that start with a single unicode > >>> character, this is about 15x faster than offset() with skip. For > >>> 100,000-character lines it's about 300x faster, so it seems to be > immune > >> to > >>> the line-painter issues skip is subject to. So for what it's worth: > >>> > >>> function offsetList D,S > >>> -- returns a comma-delimited list of the offsets of D in S > >>> set the itemDel to D > >>> repeat for each item i in S > >>> add length(i) + 1 to C > >>> put C,"" after R > >>> end repeat > >>> set the itemDel to comma > >>> if char -1 of S is D then return char 1 to -2 of R > >>> put length(C) + 1 into lenC > >>> put length(R) into lenR > >>> if lenC = lenR then return 0 > >>> return char 1 to lenR - lenC - 1 of R > >>> end offsetList > >>> > >> > >> > >> ___ > >> use-livecode mailing list > >> use-livecode@lists.runrev.com > >> Please visit this url to subscribe, unsubscribe and manage your > >> subscription preferences: > >> http://lists.runrev.com/mailman/listinfo/use-livecode > >> > > ___ > > use-livecode mailing list > > use-livecode@lists.runrev.com > > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > > http://lists.runrev.com/mailman/listinfo/use-livecode > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
It probably should be named listOffset, like itemOffset or lineOffset. Bob S > On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode > wrote: > > Nice! I *just* finished creating a github repository for it, and adding > support for multi-char search strings, much as you did. I was coming to the > list to post the update when I saw your post. > > Here's the GitHub link: https://github.com/gcanyon/offsetlist > > Here's my updated version: > > function offsetList D,S,pCase > -- returns a comma-delimited list of the offsets of D in S > set the caseSensitive to pCase is true > set the itemDel to D > put length(D) into dLength > put 1 - dLength into C > repeat for each item i in S > add length(i) + dLength to C > put C,"" after R > end repeat > set the itemDel to comma > if char -dLength to -1 of S is D then return char 1 to -2 of R > put length(C) + 1 into lenC > put length(R) into lenR > if lenC = lenR then return 0 > return char 1 to lenR - lenC - 1 of R > end offsetList > > On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < > use-livecode@lists.runrev.com> wrote: > >> Hi Geoff, >> >> thank you for this beautiful script. >> >> I modified it a bit to accept multi-character search string and also for >> case sensitivity. >> >> It definitely is a lot faster for unicode text than anything I have seen. >> >> - >> function offsetList D,S, pCase >> -- returns a comma-delimited list of the offsets of D in S >> -- pCase is a boolean for caseSensitive >> set the caseSensitive to pCase >> set the itemDel to D >> put the length of D into tDelimLength >> repeat for each item i in S >> add length(i) + tDelimLength to C >> put C - (tDelimLength - 1),"" after R >> end repeat >> set the itemDel to comma >> if char -1 of S is D then return char 1 to -2 of R >> put length(C) + 1 into lenC >> put length(R) into lenR >> if lenC = lenR then return 0 >> return char 1 to lenR - lenC - 1 of R >> end offsetList >> -- >> >> Kind regards >> Bernd >> >> >> >> >> >>> >>> Date: Thu, 1 Nov 2018 00:15:37 -0700 >>> From: Geoff Canyon >>> To: How to use LiveCode >>> Subject: Re: How to find the offset of the last instance of a >>> repeating character in a string? >>> >>> I was curious if using the itemDelimiter might work for this, so I wrote >>> the below code out of curiosity; but in my quick testing with single-byte >>> characters it was only about 30% faster than the above methods, so I >> didn't >>> bother to post it. >>> >>> But Ben Rubinstein just posted about a terrible slow-down doing pretty >> much >>> this same thing for text with unicode characters. So I ran a simple test >>> with 8000 character long strings that start with a single unicode >>> character, this is about 15x faster than offset() with skip. For >>> 100,000-character lines it's about 300x faster, so it seems to be immune >> to >>> the line-painter issues skip is subject to. So for what it's worth: >>> >>> function offsetList D,S >>> -- returns a comma-delimited list of the offsets of D in S >>> set the itemDel to D >>> repeat for each item i in S >>> add length(i) + 1 to C >>> put C,"" after R >>> end repeat >>> set the itemDel to comma >>> if char -1 of S is D then return char 1 to -2 of R >>> put length(C) + 1 into lenC >>> put length(R) into lenR >>> if lenC = lenR then return 0 >>> return char 1 to lenR - lenC - 1 of R >>> end offsetList >>> >> >> >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your >> subscription preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode >> > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Nice! I *just* finished creating a github repository for it, and adding support for multi-char search strings, much as you did. I was coming to the list to post the update when I saw your post. Here's the GitHub link: https://github.com/gcanyon/offsetlist Here's my updated version: function offsetList D,S,pCase -- returns a comma-delimited list of the offsets of D in S set the caseSensitive to pCase is true set the itemDel to D put length(D) into dLength put 1 - dLength into C repeat for each item i in S add length(i) + dLength to C put C,"" after R end repeat set the itemDel to comma if char -dLength to -1 of S is D then return char 1 to -2 of R put length(C) + 1 into lenC put length(R) into lenR if lenC = lenR then return 0 return char 1 to lenR - lenC - 1 of R end offsetList On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < use-livecode@lists.runrev.com> wrote: > Hi Geoff, > > thank you for this beautiful script. > > I modified it a bit to accept multi-character search string and also for > case sensitivity. > > It definitely is a lot faster for unicode text than anything I have seen. > > - > function offsetList D,S, pCase >-- returns a comma-delimited list of the offsets of D in S >-- pCase is a boolean for caseSensitive >set the caseSensitive to pCase >set the itemDel to D >put the length of D into tDelimLength >repeat for each item i in S > add length(i) + tDelimLength to C > put C - (tDelimLength - 1),"" after R >end repeat >set the itemDel to comma >if char -1 of S is D then return char 1 to -2 of R >put length(C) + 1 into lenC >put length(R) into lenR >if lenC = lenR then return 0 >return char 1 to lenR - lenC - 1 of R > end offsetList > -- > > Kind regards > Bernd > > > > > > > > > Date: Thu, 1 Nov 2018 00:15:37 -0700 > > From: Geoff Canyon > > To: How to use LiveCode > > Subject: Re: How to find the offset of the last instance of a > > repeating character in a string? > > > > I was curious if using the itemDelimiter might work for this, so I wrote > > the below code out of curiosity; but in my quick testing with single-byte > > characters it was only about 30% faster than the above methods, so I > didn't > > bother to post it. > > > > But Ben Rubinstein just posted about a terrible slow-down doing pretty > much > > this same thing for text with unicode characters. So I ran a simple test > > with 8000 character long strings that start with a single unicode > > character, this is about 15x faster than offset() with skip. For > > 100,000-character lines it's about 300x faster, so it seems to be immune > to > > the line-painter issues skip is subject to. So for what it's worth: > > > > function offsetList D,S > > -- returns a comma-delimited list of the offsets of D in S > > set the itemDel to D > > repeat for each item i in S > > add length(i) + 1 to C > > put C,"" after R > > end repeat > > set the itemDel to comma > > if char -1 of S is D then return char 1 to -2 of R > > put length(C) + 1 into lenC > > put length(R) into lenR > > if lenC = lenR then return 0 > > return char 1 to lenR - lenC - 1 of R > > end offsetList > > > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Hi Geoff, thank you for this beautiful script. I modified it a bit to accept multi-character search string and also for case sensitivity. It definitely is a lot faster for unicode text than anything I have seen. - function offsetList D,S, pCase -- returns a comma-delimited list of the offsets of D in S -- pCase is a boolean for caseSensitive set the caseSensitive to pCase set the itemDel to D put the length of D into tDelimLength repeat for each item i in S add length(i) + tDelimLength to C put C - (tDelimLength - 1),"" after R end repeat set the itemDel to comma if char -1 of S is D then return char 1 to -2 of R put length(C) + 1 into lenC put length(R) into lenR if lenC = lenR then return 0 return char 1 to lenR - lenC - 1 of R end offsetList -- Kind regards Bernd > > Date: Thu, 1 Nov 2018 00:15:37 -0700 > From: Geoff Canyon > To: How to use LiveCode > Subject: Re: How to find the offset of the last instance of a > repeating character in a string? > > I was curious if using the itemDelimiter might work for this, so I wrote > the below code out of curiosity; but in my quick testing with single-byte > characters it was only about 30% faster than the above methods, so I didn't > bother to post it. > > But Ben Rubinstein just posted about a terrible slow-down doing pretty much > this same thing for text with unicode characters. So I ran a simple test > with 8000 character long strings that start with a single unicode > character, this is about 15x faster than offset() with skip. For > 100,000-character lines it's about 300x faster, so it seems to be immune to > the line-painter issues skip is subject to. So for what it's worth: > > function offsetList D,S > -- returns a comma-delimited list of the offsets of D in S > set the itemDel to D > repeat for each item i in S > add length(i) + 1 to C > put C,"" after R > end repeat > set the itemDel to comma > if char -1 of S is D then return char 1 to -2 of R > put length(C) + 1 into lenC > put length(R) into lenR > if lenC = lenR then return 0 > return char 1 to lenR - lenC - 1 of R > end offsetList > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode