Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
I've updated my GitHub to the following, which adopts Brian's "starts with" (I can't count how many times I've had to re-remember that "starts with" is faster than comparing to char 1 through ) and added minor optimizations to the wrapping-up code. gc function allOffsets D,S,pCase,pNoOverlaps -- returns a comma-delimited list of the offsets of D in S set the caseSensitive to pCase is true put length(D) into dLength put pNoOverlaps and dLength > 1 into pNoOverlaps put numtochar(chartonum(char -1 of D) mod 2 + 1) after S if not pNoOverlaps then repeat with i = 1 to dLength - 1 if not (char i + 1 to -1 of D is char 1 to dLength - i of D) then next repeat put char -i to -1 of D into OV[i] put i & cr after kList end repeat end if set the itemDel to D put 1 - dLength into C if pNoOverlaps or kList is empty then repeat for each item i in S add length(i) + dLength to C put C,"" after R end repeat else repeat for each item i in S repeat for each line K in kList if i & D begins with OV[K] then put (C + K),"" after R end repeat add length(i) + dLength to C put C,"" after R end repeat end if set the itemDel to comma repeat with i = 1 to 999 if item i of R > 0 then exit repeat end repeat delete item 1 to i - 1 of R if R begins with C then return 0 return char 1 to -3 - length(C) of R end allOffsets ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
On Sun, Nov 4, 2018 at 7:11 PM Mark Wieder via use-livecode < use-livecode@lists.runrev.com> wrote: > > If you're looking for 'romeo' in pText, would you set pOverlaps to true > or to false? I'd set it to false, there's no way for "romeo" to overlap. But even if I were looking for "radar", which could overlap, I'd set it to false if I were searching an english text document, because there's no word "radaradar". But as I said, I've switched it to default to finding overlaps. ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
On Sun, Nov 4, 2018 at 7:42 PM Bob Sneidar via use-livecode < use-livecode@lists.runrev.com> wrote: > Simply add 1 to the last offset pointer. If after the first iteration you > return 1, then set the charsToSkip to 2 instead of offset + > len(searchString) if you take my meaning. > > Bob S > The method we're using avoids charsToSkip because it suffers mightily with multi-byte characters. But the latest updates handle overlapping results, see other posts in this thread. ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Here's an image of the stack in my fork of the repo: https://github.com/bwmilby/alloffsets/blob/bwm/bwm/stack_allOffsets_card_id_1018.png On Sun, Nov 4, 2018 at 10:07 PM Brian Milby wrote: > I’m working on an update to the stack now. Moving buttons to the left side > to make it easier to add more. > > Thanks, > Brian > On Nov 4, 2018, 10:02 PM -0600, Mark Wieder via use-livecode < > use-livecode@lists.runrev.com>, wrote: > > On 11/4/18 4:45 PM, Brian Milby via use-livecode wrote: > > My updated solution always looks for overlap but if none are found it uses > optimized versions of the search (private functions instead of inside the > main function). I special case for no overlap and a single overlap in the > delimiter. It is about the same speed as Geoff’s. > > > Nice. I tried to get tricky and replace that 'replace with' loop with a > 'repeat for each' loop, but ended up about 20% slower. Not at all what I > expected. > > -- > Mark Wieder > ahsoftw...@gmail.com > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
I’m working on an update to the stack now. Moving buttons to the left side to make it easier to add more. Thanks, Brian On Nov 4, 2018, 10:02 PM -0600, Mark Wieder via use-livecode , wrote: > On 11/4/18 4:45 PM, Brian Milby via use-livecode wrote: > > My updated solution always looks for overlap but if none are found it uses > > optimized versions of the search (private functions instead of inside the > > main function). I special case for no overlap and a single overlap in the > > delimiter. It is about the same speed as Geoff’s. > > Nice. I tried to get tricky and replace that 'replace with' loop with a > 'repeat for each' loop, but ended up about 20% slower. Not at all what I > expected. > > -- > Mark Wieder > ahsoftw...@gmail.com > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
On 11/4/18 4:45 PM, Brian Milby via use-livecode wrote: My updated solution always looks for overlap but if none are found it uses optimized versions of the search (private functions instead of inside the main function). I special case for no overlap and a single overlap in the delimiter. It is about the same speed as Geoff’s. Nice. I tried to get tricky and replace that 'replace with' loop with a 'repeat for each' loop, but ended up about 20% slower. Not at all what I expected. -- Mark Wieder ahsoftw...@gmail.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Simply add 1 to the last offset pointer. If after the first iteration you return 1, then set the charsToSkip to 2 instead of offset + len(searchString) if you take my meaning. Bob S > On Nov 2, 2018, at 17:43 , Geoff Canyon via use-livecode > wrote: > > I like that, changing it. Now available at > https://github.com/gcanyon/alloffsets > > One thing I don't see how to do without significantly impacting performance > is to return all offsets if there are overlapping strings. For example: > > allOffsets("aba","abababa") > > would return 1,5, when it might be reasonable to expect it to return 1,3,5. > Using the offset function with numToSkip would make that easy; adapting > allOffsets to do so would be harder to do cleanly I think. > > gc > > On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode < > use-livecode@lists.runrev.com> wrote: > >> how about allOffsets? >> >> Bob S >> >> >>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode < >> use-livecode@lists.runrev.com> wrote: >>> >>> All of those return a single value; I wanted to convey the concept of >>> returning multiple values. To me listOffset implies it does the same >> thing >>> as itemOffset, since items come in a list. How about: >>> >>> offsets -- not my favorite because it's almost indistinguishable from >> offset >>> offsetsOf -- seems a tad clumsy >>> >>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode < >>> use-livecode@lists.runrev.com> wrote: >>> >>>> It probably should be named listOffset, like itemOffset or lineOffset. >>>> >>>> Bob S >>>> >>>> >>>>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode < >>>> use-livecode@lists.runrev.com> wrote: >>>>> >>>>> Nice! I *just* finished creating a github repository for it, and adding >>>>> support for multi-char search strings, much as you did. I was coming to >>>> the >>>>> list to post the update when I saw your post. >>>>> >>>>> Here's the GitHub link: https://github.com/gcanyon/offsetlist >>>>> >>>>> Here's my updated version: >>>>> >>>>> function offsetList D,S,pCase >>>>> -- returns a comma-delimited list of the offsets of D in S >>>>> set the caseSensitive to pCase is true >>>>> set the itemDel to D >>>>> put length(D) into dLength >>>>> put 1 - dLength into C >>>>> repeat for each item i in S >>>>>add length(i) + dLength to C >>>>>put C,"" after R >>>>> end repeat >>>>> set the itemDel to comma >>>>> if char -dLength to -1 of S is D then return char 1 to -2 of R >>>>> put length(C) + 1 into lenC >>>>> put length(R) into lenR >>>>> if lenC = lenR then return 0 >>>>> return char 1 to lenR - lenC - 1 of R >>>>> end offsetList >>>>> >>>>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < >>>>> use-livecode@lists.runrev.com> wrote: >>>>> >>>>>> Hi Geoff, >>>>>> >>>>>> thank you for this beautiful script. >>>>>> >>>>>> I modified it a bit to accept multi-character search string and also >> for >>>>>> case sensitivity. >>>>>> >>>>>> It definitely is a lot faster for unicode text than anything I have >>>> seen. >>>>>> >>>>>> - >>>>>> function offsetList D,S, pCase >>>>>> -- returns a comma-delimited list of the offsets of D in S >>>>>> -- pCase is a boolean for caseSensitive >>>>>> set the caseSensitive to pCase >>>>>> set the itemDel to D >>>>>> put the length of D into tDelimLength >>>>>> repeat for each item i in S >>>>>>add length(i) + tDelimLength to C >>>>>>put C - (tDelimLength - 1),"" after R >>>>>> end repeat >>>>>> set the itemDel to comma >>>>>> if char -1 of S is D then return char 1 to -2 of R >>>>>> put length(C) + 1 into lenC >>>>>> put length(R) into lenR >>>>>
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
On 11/4/18 6:49 PM, Geoff Canyon via use-livecode wrote: I'm not sure I agree that it would be so unlikely to know that overlaps won't occur (or that it's unreasonable to not want them). If I'm looking for every instance of "romeo" in romeo and juliet, then obviously I'm not expecting, nor do I want, overlaps. Sure, but in that case you'd be better off using the faster 'offset' function. Or do you mean every instance of 'romeo' in the play itself? There I can see why you'd want to set it to false for speed. My point isn't really whether pOverlaps should default to true or false, but that you need detailed knowledge of the corpus of data before calling the function. If you're looking for 'romeo' in pText, would you set pOverlaps to true or to false? -- Mark Wieder ahsoftw...@gmail.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
On Sun, Nov 4, 2018 at 4:34 PM Mark Wieder via use-livecode < use-livecode@lists.runrev.com> wrote: > On 11/4/18 10:40 AM, Geoff Canyon via use-livecode wrote: > > I also added a "with overlaps" option. > > My problem with the pWithOverlaps parameter is that is requires a priori > knowledge of the data being consumed. If you already know there are > overlaps then you'd set the parameter to true. If you don't know whether > or not there are overlaps, then you'd need to set it to true so you > don't miss anything (aside, of course, for the trivial case where you > don't care whether or not there are overlaps - is there a use case for > this?). > > The only time you would set it to false is after you've already > determined that there are no overlaps, and the time spent on that would > probably more than offset the extra processing in the function. I'm not sure I agree that it would be so unlikely to know that overlaps won't occur (or that it's unreasonable to not want them). If I'm looking for every instance of "romeo" in romeo and juliet, then obviously I'm not expecting, nor do I want, overlaps. Likewise, overlaps can only occur if the search string allows for them, so "romeo" makes it impossible from the get go That said, it seems reasonable to default overlaps to true rather than false. I'll set it up that way when I add the modification below. On Sun, Nov 4, 2018 at 4:02 PM Brian Milby via use-livecode < use-livecode@lists.runrev.com> wrote: > > put kList is not empty into pWithOverlaps > Good point -- I suppose it also makes sense (albeit that the speed improvement would be trivial) to not bother even building kList if the term to be found is a single character. gc ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
My updated solution always looks for overlap but if none are found it uses optimized versions of the search (private functions instead of inside the main function). I special case for no overlap and a single overlap in the delimiter. It is about the same speed as Geoff’s. Thanks, Brian On Nov 4, 2018, 6:34 PM -0600, Mark Wieder via use-livecode , wrote: > On 11/4/18 10:40 AM, Geoff Canyon via use-livecode wrote: > > I also added a "with overlaps" option. > > My problem with the pWithOverlaps parameter is that is requires a priori > knowledge of the data being consumed. If you already know there are > overlaps then you'd set the parameter to true. If you don't know whether > or not there are overlaps, then you'd need to set it to true so you > don't miss anything (aside, of course, for the trivial case where you > don't care whether or not there are overlaps - is there a use case for > this?). > > The only time you would set it to false is after you've already > determined that there are no overlaps, and the time spent on that would > probably more than offset the extra processing in the function. > > -- > Mark Wieder > ahsoftw...@gmail.com > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
On 11/4/18 10:40 AM, Geoff Canyon via use-livecode wrote: I also added a "with overlaps" option. My problem with the pWithOverlaps parameter is that is requires a priori knowledge of the data being consumed. If you already know there are overlaps then you'd set the parameter to true. If you don't know whether or not there are overlaps, then you'd need to set it to true so you don't miss anything (aside, of course, for the trivial case where you don't care whether or not there are overlaps - is there a use case for this?). The only time you would set it to false is after you've already determined that there are no overlaps, and the time spent on that would probably more than offset the extra processing in the function. -- Mark Wieder ahsoftw...@gmail.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Logic matches my solution. I also validated my solution using just the offset function. Speed hit for with overlap is similar. One possible optimization: put kList is not empty into pWithOverlaps If with overlaps was requested but the source delimiter did not contain any overlaps, then the extra loops are skipped. Adding a character to the end is clever. I'll need to incorporate that and see what it does to my method. My take on the code updates is here: https://github.com/bwmilby/alloffsets/blob/bwm/bwm/allOffsets_Scripts/stack_allOffsets_button_id_1026.livecodescript Stack and index of scripts here: https://github.com/bwmilby/alloffsets/tree/bwm/bwm On Sun, Nov 4, 2018 at 12:42 PM Geoff Canyon via use-livecode < use-livecode@lists.runrev.com> wrote: > Alex, good catch! The code below and at > https://github.com/gcanyon/alloffsets now puts a stop character after the > string to prevent the error you found. I also added a "with overlaps" > option. I think this is correct, and about as efficient as possible, but > thanks to anyone who finds a bug or a faster way. > > gc > > > function allOffsets D,S,pCase,pWithOverlaps >-- returns a comma-delimited list of the offsets of D in S >set the caseSensitive to pCase is true >put length(D) into dLength >put numtochar(chartonum(char -1 of D) mod 2 + 1) after S >if pWithOverlaps then > repeat with i = 1 to dLength - 1 > if not (char i + 1 to -1 of D is char 1 to dLength - i of D) then > next repeat > put char -i to -1 of D into OV[i] > put i & cr after kList > end repeat >end if >set the itemDel to D >put 1 - dLength into C >if pWithOverlaps then > repeat for each item i in S > repeat for each line K in kList > if char 1 to K of (i & D) is OV[K] then put (C + K),"" after R > end repeat > add length(i) + dLength to C > put C,"" after R > end repeat >else > repeat for each item i in S > add length(i) + dLength to C > put C,"" after R > end repeat >end if >set the itemDel to comma >repeat until item 1 of R > 0 > delete item 1 of R >end repeat >delete item -1 of R >if R is empty then return 0 else return char 1 to -2 of R > end allOffsets > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Alex, good catch! The code below and at https://github.com/gcanyon/alloffsets now puts a stop character after the string to prevent the error you found. I also added a "with overlaps" option. I think this is correct, and about as efficient as possible, but thanks to anyone who finds a bug or a faster way. gc function allOffsets D,S,pCase,pWithOverlaps -- returns a comma-delimited list of the offsets of D in S set the caseSensitive to pCase is true put length(D) into dLength put numtochar(chartonum(char -1 of D) mod 2 + 1) after S if pWithOverlaps then repeat with i = 1 to dLength - 1 if not (char i + 1 to -1 of D is char 1 to dLength - i of D) then next repeat put char -i to -1 of D into OV[i] put i & cr after kList end repeat end if set the itemDel to D put 1 - dLength into C if pWithOverlaps then repeat for each item i in S repeat for each line K in kList if char 1 to K of (i & D) is OV[K] then put (C + K),"" after R end repeat add length(i) + dLength to C put C,"" after R end repeat else repeat for each item i in S add length(i) + dLength to C put C,"" after R end repeat end if set the itemDel to comma repeat until item 1 of R > 0 delete item 1 of R end repeat delete item -1 of R if R is empty then return 0 else return char 1 to -2 of R end allOffsets ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
t;> >>> offsets -- not my favorite because it's almost indistinguishable from >> >> offset >> >>> offsetsOf -- seems a tad clumsy >> >>> >> >>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode < >> >>> use-livecode@lists.runrev.com> wrote: >> >>> >> >>>> It probably should be named listOffset, like itemOffset or >> lineOffset. >> >>>> >> >>>> Bob S >> >>>> >> >>>> >> >>>>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode < >> >>>> use-livecode@lists.runrev.com> wrote: >> >>>>> Nice! I *just* finished creating a github repository for it, and >> adding >> >>>>> support for multi-char search strings, much as you did. I was >> coming to >> >>>> the >> >>>>> list to post the update when I saw your post. >> >>>>> >> >>>>> Here's the GitHub link: https://github.com/gcanyon/offsetlist >> >>>>> >> >>>>> Here's my updated version: >> >>>>> >> >>>>> function offsetList D,S,pCase >> >>>>> -- returns a comma-delimited list of the offsets of D in S >> >>>>> set the caseSensitive to pCase is true >> >>>>> set the itemDel to D >> >>>>> put length(D) into dLength >> >>>>> put 1 - dLength into C >> >>>>> repeat for each item i in S >> >>>>> add length(i) + dLength to C >> >>>>> put C,"" after R >> >>>>> end repeat >> >>>>> set the itemDel to comma >> >>>>> if char -dLength to -1 of S is D then return char 1 to -2 of R >> >>>>> put length(C) + 1 into lenC >> >>>>> put length(R) into lenR >> >>>>> if lenC = lenR then return 0 >> >>>>> return char 1 to lenR - lenC - 1 of R >> >>>>> end offsetList >> >>>>> >> >>>>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < >> >>>>> use-livecode@lists.runrev.com> wrote: >> >>>>> >> >>>>>> Hi Geoff, >> >>>>>> >> >>>>>> thank you for this beautiful script. >> >>>>>> >> >>>>>> I modified it a bit to accept multi-character search string and >> also >> >> for >> >>>>>> case sensitivity. >> >>>>>> >> >>>>>> It definitely is a lot faster for unicode text than anything I have >> >>>> seen. >> >>>>>> - >> >>>>>> function offsetList D,S, pCase >> >>>>>> -- returns a comma-delimited list of the offsets of D in S >> >>>>>> -- pCase is a boolean for caseSensitive >> >>>>>> set the caseSensitive to pCase >> >>>>>> set the itemDel to D >> >>>>>> put the length of D into tDelimLength >> >>>>>> repeat for each item i in S >> >>>>>> add length(i) + tDelimLength to C >> >>>>>> put C - (tDelimLength - 1),"" after R >> >>>>>> end repeat >> >>>>>> set the itemDel to comma >> >>>>>> if char -1 of S is D then return char 1 to -2 of R >> >>>>>> put length(C) + 1 into lenC >> >>>>>> put length(R) into lenR >> >>>>>> if lenC = lenR then return 0 >> >>>>>> return char 1 to lenR - lenC - 1 of R >> >>>>>> end offsetList >> >>>>>> -- >> >>>>>> >> >>>>>> Kind regards >> >>>>>> Bernd >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>>> Date: Thu, 1 Nov 2018 00:15:37 -0700 >> >>>>>>> From: Geoff Canyon >> >>>>>>> To: How to use LiveCode >> >>>>>>> Subject: Re: How to find the
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
>>> set the itemDel to D > >>>>> put length(D) into dLength > >>>>> put 1 - dLength into C > >>>>> repeat for each item i in S > >>>>> add length(i) + dLength to C > >>>>> put C,"" after R > >>>>> end repeat > >>>>> set the itemDel to comma > >>>>> if char -dLength to -1 of S is D then return char 1 to -2 of R > >>>>> put length(C) + 1 into lenC > >>>>> put length(R) into lenR > >>>>> if lenC = lenR then return 0 > >>>>> return char 1 to lenR - lenC - 1 of R > >>>>> end offsetList > >>>>> > >>>>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < > >>>>> use-livecode@lists.runrev.com> wrote: > >>>>> > >>>>>> Hi Geoff, > >>>>>> > >>>>>> thank you for this beautiful script. > >>>>>> > >>>>>> I modified it a bit to accept multi-character search string and also > >> for > >>>>>> case sensitivity. > >>>>>> > >>>>>> It definitely is a lot faster for unicode text than anything I have > >>>> seen. > >>>>>> - > >>>>>> function offsetList D,S, pCase > >>>>>> -- returns a comma-delimited list of the offsets of D in S > >>>>>> -- pCase is a boolean for caseSensitive > >>>>>> set the caseSensitive to pCase > >>>>>> set the itemDel to D > >>>>>> put the length of D into tDelimLength > >>>>>> repeat for each item i in S > >>>>>> add length(i) + tDelimLength to C > >>>>>> put C - (tDelimLength - 1),"" after R > >>>>>> end repeat > >>>>>> set the itemDel to comma > >>>>>> if char -1 of S is D then return char 1 to -2 of R > >>>>>> put length(C) + 1 into lenC > >>>>>> put length(R) into lenR > >>>>>> if lenC = lenR then return 0 > >>>>>> return char 1 to lenR - lenC - 1 of R > >>>>>> end offsetList > >>>>>> -- > >>>>>> > >>>>>> Kind regards > >>>>>> Bernd > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Date: Thu, 1 Nov 2018 00:15:37 -0700 > >>>>>>> From: Geoff Canyon > >>>>>>> To: How to use LiveCode > >>>>>>> Subject: Re: How to find the offset of the last instance of a > >>>>>>> repeating character in a string? > >>>>>>> > >>>>>>> I was curious if using the itemDelimiter might work for this, so I > >>>> wrote > >>>>>>> the below code out of curiosity; but in my quick testing with > >>>> single-byte > >>>>>>> characters it was only about 30% faster than the above methods, so > I > >>>>>> didn't > >>>>>>> bother to post it. > >>>>>>> > >>>>>>> But Ben Rubinstein just posted about a terrible slow-down doing > >> pretty > >>>>>> much > >>>>>>> this same thing for text with unicode characters. So I ran a simple > >>>> test > >>>>>>> with 8000 character long strings that start with a single unicode > >>>>>>> character, this is about 15x faster than offset() with skip. For > >>>>>>> 100,000-character lines it's about 300x faster, so it seems to be > >>>> immune > >>>>>> to > >>>>>>> the line-painter issues skip is subject to. So for what it's worth: > >>>>>>> > >>>>>>> function offsetList D,S > >>>>>>> -- returns a comma-delimited list of the offsets of D in S > >>>>>>> set the itemDel to D > >>>>>>> repeat for each item i in S > >>>>>>> add length(i) + 1 to C > >>>>>>> put C,"" after R > >>>>>>> end repeat > >>>>>>> set the itemDel to comma > >>>>>>> if char -1 of S is D then return char 1 to -2 of R > >>>>>>> put length(C) + 1 into lenC > >>>>>>> put length(R) into lenR > >>>>>>> if lenC = lenR then return 0 > >>>>>>> return char 1 to lenR - lenC - 1 of R > >>>>>>> end offsetList > >>>>>>> > >>>>>> > >>>>>> ___ > >>>>>> use-livecode mailing list > >>>>>> use-livecode@lists.runrev.com > >>>>>> Please visit this url to subscribe, unsubscribe and manage your > >>>>>> subscription preferences: > >>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode > >>>>>> > >>>>> ___ > >>>>> use-livecode mailing list > >>>>> use-livecode@lists.runrev.com > >>>>> Please visit this url to subscribe, unsubscribe and manage your > >>>> subscription preferences: > >>>>> http://lists.runrev.com/mailman/listinfo/use-livecode > >>>> > >>>> ___ > >>>> use-livecode mailing list > >>>> use-livecode@lists.runrev.com > >>>> Please visit this url to subscribe, unsubscribe and manage your > >>>> subscription preferences: > >>>> http://lists.runrev.com/mailman/listinfo/use-livecode > >>>> > >>> ___ > >>> use-livecode mailing list > >>> use-livecode@lists.runrev.com > >>> Please visit this url to subscribe, unsubscribe and manage your > >> subscription preferences: > >>> http://lists.runrev.com/mailman/listinfo/use-livecode > >> > >> ___ > >> use-livecode mailing list > >> use-livecode@lists.runrev.com > >> Please visit this url to subscribe, unsubscribe and manage your > >> subscription preferences: > >> http://lists.runrev.com/mailman/listinfo/use-livecode > >> > > ___ > > use-livecode mailing list > > use-livecode@lists.runrev.com > > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > > http://lists.runrev.com/mailman/listinfo/use-livecode > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Hi Geoff, unfortunately the impact of overlapping delimiter strings is more severe than simply not finding them. The code on github gets the wrong answer if there is an overlapping string at the very end of the search string, e.g. alloffsets("", "a") wrongly gives 1,5,10 I suspect the test for if char -dLength to -1 of S is D then return char 1 to -2 of R should be (something like) if item -1 of S is empty then return char 1 to -2 of R but to be honest, I'm not 10% certain of that. Alex. On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote: I like that, changing it. Now available at https://github.com/gcanyon/alloffsets One thing I don't see how to do without significantly impacting performance is to return all offsets if there are overlapping strings. For example: allOffsets("aba","abababa") would return 1,5, when it might be reasonable to expect it to return 1,3,5. Using the offset function with numToSkip would make that easy; adapting allOffsets to do so would be harder to do cleanly I think. gc On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode < use-livecode@lists.runrev.com> wrote: how about allOffsets? Bob S On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode < use-livecode@lists.runrev.com> wrote: All of those return a single value; I wanted to convey the concept of returning multiple values. To me listOffset implies it does the same thing as itemOffset, since items come in a list. How about: offsets -- not my favorite because it's almost indistinguishable from offset offsetsOf -- seems a tad clumsy On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode < use-livecode@lists.runrev.com> wrote: It probably should be named listOffset, like itemOffset or lineOffset. Bob S On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode < use-livecode@lists.runrev.com> wrote: Nice! I *just* finished creating a github repository for it, and adding support for multi-char search strings, much as you did. I was coming to the list to post the update when I saw your post. Here's the GitHub link: https://github.com/gcanyon/offsetlist Here's my updated version: function offsetList D,S,pCase -- returns a comma-delimited list of the offsets of D in S set the caseSensitive to pCase is true set the itemDel to D put length(D) into dLength put 1 - dLength into C repeat for each item i in S add length(i) + dLength to C put C,"" after R end repeat set the itemDel to comma if char -dLength to -1 of S is D then return char 1 to -2 of R put length(C) + 1 into lenC put length(R) into lenR if lenC = lenR then return 0 return char 1 to lenR - lenC - 1 of R end offsetList On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < use-livecode@lists.runrev.com> wrote: Hi Geoff, thank you for this beautiful script. I modified it a bit to accept multi-character search string and also for case sensitivity. It definitely is a lot faster for unicode text than anything I have seen. - function offsetList D,S, pCase -- returns a comma-delimited list of the offsets of D in S -- pCase is a boolean for caseSensitive set the caseSensitive to pCase set the itemDel to D put the length of D into tDelimLength repeat for each item i in S add length(i) + tDelimLength to C put C - (tDelimLength - 1),"" after R end repeat set the itemDel to comma if char -1 of S is D then return char 1 to -2 of R put length(C) + 1 into lenC put length(R) into lenR if lenC = lenR then return 0 return char 1 to lenR - lenC - 1 of R end offsetList -- Kind regards Bernd Date: Thu, 1 Nov 2018 00:15:37 -0700 From: Geoff Canyon To: How to use LiveCode Subject: Re: How to find the offset of the last instance of a repeating character in a string? I was curious if using the itemDelimiter might work for this, so I wrote the below code out of curiosity; but in my quick testing with single-byte characters it was only about 30% faster than the above methods, so I didn't bother to post it. But Ben Rubinstein just posted about a terrible slow-down doing pretty much this same thing for text with unicode characters. So I ran a simple test with 8000 character long strings that start with a single unicode character, this is about 15x faster than offset() with skip. For 100,000-character lines it's about 300x faster, so it seems to be immune to the line-painter issues skip is subject to. So for what it's worth: function offsetList D,S -- returns a comma-delimited list of the offsets of D in S set the itemDel to D repeat for each item i in S add length(i) + 1 to C put C,"" after R end repeat set the itemDel to comma if char -1 of S is D then return char 1 to -
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Here is something... probably needs some optimization function allOffsets2 D,S,pCase local dLength, C, R -- returns a comma-delimited list of the offsets of D in S set the caseSensitive to pCase is true set the itemDel to D put length(D) into dLength put 1 - dLength into C if dLength > 1 then local n, i, j, D2, L2 put 0 into n repeat with i = 2 to dLength if char i to -1 of D is char 1 to -i of D then add 1 to n put char (1-i) to -1 of D into D2[n] put i-1 into L2[n] end if end repeat end if repeat for each item i in S if C > 0 and n > 0 then repeat with j = 1 to n if i&D begins with D2[j] then put C+L2[j],"" after R end if end repeat end if add length(i) + dLength to C put C,"" after R end repeat set the itemDel to comma delete char -1 of R if item -1 of R > len(S) then if the number of items of R is 1 then return 0 else delete item -1 of R end if end if if char -dLength to -1 of S is D then return R end if repeat with j = n down to 1 if char -len(D2[j]) to -1 of S is D2[j] then delete item -1 of R end if end repeat return R end allOffsets2 I think a couple of private functions would be good. One for 0 overlap, one for a single overlap, then a final general one for any number of overlaps (the core of the above). After the loop that generates D2/L2 I would branch based on n to avoid the additional comparisons inside the loop. On Fri, Nov 2, 2018 at 9:45 PM Alex Tweedly via use-livecode < use-livecode@lists.runrev.com> wrote: > Oh dear - answering my own posts rarely a good sign :-) > > > On 03/11/2018 02:10, Alex Tweedly via use-livecode wrote: > > > > On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote: > >> One thing I don't see how to do without significantly impacting > >> performance > >> is to return all offsets if there are overlapping strings. For example: > >> > >> allOffsets("aba","abababa") > >> > >> would return 1,5, when it might be reasonable to expect it to return > >> 1,3,5. > >> Using the offset function with numToSkip would make that easy; adapting > >> allOffsets to do so would be harder to do cleanly I think. > >> > > Can I suggest changing it to "someOffsets()" :-) :-) > > > > But seriously, can you not iteratively run "allofsets" ? > > > Answer : NO. That doesn't work. > However, there is a more efficient way that does work - but it needs to > be tested before I post it. > > -- Alex. > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Oh dear - answering my own posts rarely a good sign :-) On 03/11/2018 02:10, Alex Tweedly via use-livecode wrote: On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote: One thing I don't see how to do without significantly impacting performance is to return all offsets if there are overlapping strings. For example: allOffsets("aba","abababa") would return 1,5, when it might be reasonable to expect it to return 1,3,5. Using the offset function with numToSkip would make that easy; adapting allOffsets to do so would be harder to do cleanly I think. Can I suggest changing it to "someOffsets()" :-) :-) But seriously, can you not iteratively run "allofsets" ? Answer : NO. That doesn't work. However, there is a more efficient way that does work - but it needs to be tested before I post it. -- Alex. ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote: I like that, changing it. Now available at https://github.com/gcanyon/alloffsets One thing I don't see how to do without significantly impacting performance is to return all offsets if there are overlapping strings. For example: allOffsets("aba","abababa") would return 1,5, when it might be reasonable to expect it to return 1,3,5. Using the offset function with numToSkip would make that easy; adapting allOffsets to do so would be harder to do cleanly I think. Can I suggest changing it to "someOffsets()" :-) :-) But seriously, can you not iteratively run "allofsets" ? something like (typed straight into email - totally untested) function allOffsets pDel, pStr repeat with c = 1 to 255 -- or some other upper limit ? if NOT pDel contains numtochar(c) then put numtochar(c) into c exit repeat end if end repeat repeat forever put someOffsets(pDel, pStr) into newR if the number of items in newR = 0 then exit repeat repeat for each item I in newR put c into char I of newR end repeat put newR after R end repeat sort items of R numeric return R end alloffsets -- Alex. ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
I like that, changing it. Now available at https://github.com/gcanyon/alloffsets One thing I don't see how to do without significantly impacting performance is to return all offsets if there are overlapping strings. For example: allOffsets("aba","abababa") would return 1,5, when it might be reasonable to expect it to return 1,3,5. Using the offset function with numToSkip would make that easy; adapting allOffsets to do so would be harder to do cleanly I think. gc On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode < use-livecode@lists.runrev.com> wrote: > how about allOffsets? > > Bob S > > > > On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode < > use-livecode@lists.runrev.com> wrote: > > > > All of those return a single value; I wanted to convey the concept of > > returning multiple values. To me listOffset implies it does the same > thing > > as itemOffset, since items come in a list. How about: > > > > offsets -- not my favorite because it's almost indistinguishable from > offset > > offsetsOf -- seems a tad clumsy > > > > On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode < > > use-livecode@lists.runrev.com> wrote: > > > >> It probably should be named listOffset, like itemOffset or lineOffset. > >> > >> Bob S > >> > >> > >>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode < > >> use-livecode@lists.runrev.com> wrote: > >>> > >>> Nice! I *just* finished creating a github repository for it, and adding > >>> support for multi-char search strings, much as you did. I was coming to > >> the > >>> list to post the update when I saw your post. > >>> > >>> Here's the GitHub link: https://github.com/gcanyon/offsetlist > >>> > >>> Here's my updated version: > >>> > >>> function offsetList D,S,pCase > >>> -- returns a comma-delimited list of the offsets of D in S > >>> set the caseSensitive to pCase is true > >>> set the itemDel to D > >>> put length(D) into dLength > >>> put 1 - dLength into C > >>> repeat for each item i in S > >>> add length(i) + dLength to C > >>> put C,"" after R > >>> end repeat > >>> set the itemDel to comma > >>> if char -dLength to -1 of S is D then return char 1 to -2 of R > >>> put length(C) + 1 into lenC > >>> put length(R) into lenR > >>> if lenC = lenR then return 0 > >>> return char 1 to lenR - lenC - 1 of R > >>> end offsetList > >>> > >>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < > >>> use-livecode@lists.runrev.com> wrote: > >>> > >>>> Hi Geoff, > >>>> > >>>> thank you for this beautiful script. > >>>> > >>>> I modified it a bit to accept multi-character search string and also > for > >>>> case sensitivity. > >>>> > >>>> It definitely is a lot faster for unicode text than anything I have > >> seen. > >>>> > >>>> - > >>>> function offsetList D,S, pCase > >>>> -- returns a comma-delimited list of the offsets of D in S > >>>> -- pCase is a boolean for caseSensitive > >>>> set the caseSensitive to pCase > >>>> set the itemDel to D > >>>> put the length of D into tDelimLength > >>>> repeat for each item i in S > >>>> add length(i) + tDelimLength to C > >>>> put C - (tDelimLength - 1),"" after R > >>>> end repeat > >>>> set the itemDel to comma > >>>> if char -1 of S is D then return char 1 to -2 of R > >>>> put length(C) + 1 into lenC > >>>> put length(R) into lenR > >>>> if lenC = lenR then return 0 > >>>> return char 1 to lenR - lenC - 1 of R > >>>> end offsetList > >>>> -- > >>>> > >>>> Kind regards > >>>> Bernd > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> > >>>>> Date: Thu, 1 Nov 2018 00:15:37 -0700 > >>>>> From: Geoff Canyon > >>>>> To: How to use LiveCode > >>>>> Subject: Re: How to
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
how about allOffsets? Bob S > On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode > wrote: > > All of those return a single value; I wanted to convey the concept of > returning multiple values. To me listOffset implies it does the same thing > as itemOffset, since items come in a list. How about: > > offsets -- not my favorite because it's almost indistinguishable from offset > offsetsOf -- seems a tad clumsy > > On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode < > use-livecode@lists.runrev.com> wrote: > >> It probably should be named listOffset, like itemOffset or lineOffset. >> >> Bob S >> >> >>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode < >> use-livecode@lists.runrev.com> wrote: >>> >>> Nice! I *just* finished creating a github repository for it, and adding >>> support for multi-char search strings, much as you did. I was coming to >> the >>> list to post the update when I saw your post. >>> >>> Here's the GitHub link: https://github.com/gcanyon/offsetlist >>> >>> Here's my updated version: >>> >>> function offsetList D,S,pCase >>> -- returns a comma-delimited list of the offsets of D in S >>> set the caseSensitive to pCase is true >>> set the itemDel to D >>> put length(D) into dLength >>> put 1 - dLength into C >>> repeat for each item i in S >>> add length(i) + dLength to C >>> put C,"" after R >>> end repeat >>> set the itemDel to comma >>> if char -dLength to -1 of S is D then return char 1 to -2 of R >>> put length(C) + 1 into lenC >>> put length(R) into lenR >>> if lenC = lenR then return 0 >>> return char 1 to lenR - lenC - 1 of R >>> end offsetList >>> >>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < >>> use-livecode@lists.runrev.com> wrote: >>> >>>> Hi Geoff, >>>> >>>> thank you for this beautiful script. >>>> >>>> I modified it a bit to accept multi-character search string and also for >>>> case sensitivity. >>>> >>>> It definitely is a lot faster for unicode text than anything I have >> seen. >>>> >>>> - >>>> function offsetList D,S, pCase >>>> -- returns a comma-delimited list of the offsets of D in S >>>> -- pCase is a boolean for caseSensitive >>>> set the caseSensitive to pCase >>>> set the itemDel to D >>>> put the length of D into tDelimLength >>>> repeat for each item i in S >>>> add length(i) + tDelimLength to C >>>> put C - (tDelimLength - 1),"" after R >>>> end repeat >>>> set the itemDel to comma >>>> if char -1 of S is D then return char 1 to -2 of R >>>> put length(C) + 1 into lenC >>>> put length(R) into lenR >>>> if lenC = lenR then return 0 >>>> return char 1 to lenR - lenC - 1 of R >>>> end offsetList >>>> -- >>>> >>>> Kind regards >>>> Bernd >>>> >>>> >>>> >>>> >>>> >>>>> >>>>> Date: Thu, 1 Nov 2018 00:15:37 -0700 >>>>> From: Geoff Canyon >>>>> To: How to use LiveCode >>>>> Subject: Re: How to find the offset of the last instance of a >>>>> repeating character in a string? >>>>> >>>>> I was curious if using the itemDelimiter might work for this, so I >> wrote >>>>> the below code out of curiosity; but in my quick testing with >> single-byte >>>>> characters it was only about 30% faster than the above methods, so I >>>> didn't >>>>> bother to post it. >>>>> >>>>> But Ben Rubinstein just posted about a terrible slow-down doing pretty >>>> much >>>>> this same thing for text with unicode characters. So I ran a simple >> test >>>>> with 8000 character long strings that start with a single unicode >>>>> character, this is about 15x faster than offset() with skip. For >>>>> 100,000-character lines it's about 300x faster, so it seems to be >> immune >>>> to >>>>> the line-painter issues skip is subject to.
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
All of those return a single value; I wanted to convey the concept of returning multiple values. To me listOffset implies it does the same thing as itemOffset, since items come in a list. How about: offsets -- not my favorite because it's almost indistinguishable from offset offsetsOf -- seems a tad clumsy On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode < use-livecode@lists.runrev.com> wrote: > It probably should be named listOffset, like itemOffset or lineOffset. > > Bob S > > > > On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode < > use-livecode@lists.runrev.com> wrote: > > > > Nice! I *just* finished creating a github repository for it, and adding > > support for multi-char search strings, much as you did. I was coming to > the > > list to post the update when I saw your post. > > > > Here's the GitHub link: https://github.com/gcanyon/offsetlist > > > > Here's my updated version: > > > > function offsetList D,S,pCase > > -- returns a comma-delimited list of the offsets of D in S > > set the caseSensitive to pCase is true > > set the itemDel to D > > put length(D) into dLength > > put 1 - dLength into C > > repeat for each item i in S > > add length(i) + dLength to C > > put C,"" after R > > end repeat > > set the itemDel to comma > > if char -dLength to -1 of S is D then return char 1 to -2 of R > > put length(C) + 1 into lenC > > put length(R) into lenR > > if lenC = lenR then return 0 > > return char 1 to lenR - lenC - 1 of R > > end offsetList > > > > On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < > > use-livecode@lists.runrev.com> wrote: > > > >> Hi Geoff, > >> > >> thank you for this beautiful script. > >> > >> I modified it a bit to accept multi-character search string and also for > >> case sensitivity. > >> > >> It definitely is a lot faster for unicode text than anything I have > seen. > >> > >> - > >> function offsetList D,S, pCase > >> -- returns a comma-delimited list of the offsets of D in S > >> -- pCase is a boolean for caseSensitive > >> set the caseSensitive to pCase > >> set the itemDel to D > >> put the length of D into tDelimLength > >> repeat for each item i in S > >> add length(i) + tDelimLength to C > >> put C - (tDelimLength - 1),"" after R > >> end repeat > >> set the itemDel to comma > >> if char -1 of S is D then return char 1 to -2 of R > >> put length(C) + 1 into lenC > >> put length(R) into lenR > >> if lenC = lenR then return 0 > >> return char 1 to lenR - lenC - 1 of R > >> end offsetList > >> -- > >> > >> Kind regards > >> Bernd > >> > >> > >> > >> > >> > >>> > >>> Date: Thu, 1 Nov 2018 00:15:37 -0700 > >>> From: Geoff Canyon > >>> To: How to use LiveCode > >>> Subject: Re: How to find the offset of the last instance of a > >>> repeating character in a string? > >>> > >>> I was curious if using the itemDelimiter might work for this, so I > wrote > >>> the below code out of curiosity; but in my quick testing with > single-byte > >>> characters it was only about 30% faster than the above methods, so I > >> didn't > >>> bother to post it. > >>> > >>> But Ben Rubinstein just posted about a terrible slow-down doing pretty > >> much > >>> this same thing for text with unicode characters. So I ran a simple > test > >>> with 8000 character long strings that start with a single unicode > >>> character, this is about 15x faster than offset() with skip. For > >>> 100,000-character lines it's about 300x faster, so it seems to be > immune > >> to > >>> the line-painter issues skip is subject to. So for what it's worth: > >>> > >>> function offsetList D,S > >>> -- returns a comma-delimited list of the offsets of D in S > >>> set the itemDel to D > >>> repeat for each item i in S > >>> add length(i) + 1 to C > >>> put C,"" after R > >>> end repeat > >>> set the itemDel to comma > >>> if char -1 of S is D then return c
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
It probably should be named listOffset, like itemOffset or lineOffset. Bob S > On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode > wrote: > > Nice! I *just* finished creating a github repository for it, and adding > support for multi-char search strings, much as you did. I was coming to the > list to post the update when I saw your post. > > Here's the GitHub link: https://github.com/gcanyon/offsetlist > > Here's my updated version: > > function offsetList D,S,pCase > -- returns a comma-delimited list of the offsets of D in S > set the caseSensitive to pCase is true > set the itemDel to D > put length(D) into dLength > put 1 - dLength into C > repeat for each item i in S > add length(i) + dLength to C > put C,"" after R > end repeat > set the itemDel to comma > if char -dLength to -1 of S is D then return char 1 to -2 of R > put length(C) + 1 into lenC > put length(R) into lenR > if lenC = lenR then return 0 > return char 1 to lenR - lenC - 1 of R > end offsetList > > On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < > use-livecode@lists.runrev.com> wrote: > >> Hi Geoff, >> >> thank you for this beautiful script. >> >> I modified it a bit to accept multi-character search string and also for >> case sensitivity. >> >> It definitely is a lot faster for unicode text than anything I have seen. >> >> - >> function offsetList D,S, pCase >> -- returns a comma-delimited list of the offsets of D in S >> -- pCase is a boolean for caseSensitive >> set the caseSensitive to pCase >> set the itemDel to D >> put the length of D into tDelimLength >> repeat for each item i in S >> add length(i) + tDelimLength to C >> put C - (tDelimLength - 1),"" after R >> end repeat >> set the itemDel to comma >> if char -1 of S is D then return char 1 to -2 of R >> put length(C) + 1 into lenC >> put length(R) into lenR >> if lenC = lenR then return 0 >> return char 1 to lenR - lenC - 1 of R >> end offsetList >> -- >> >> Kind regards >> Bernd >> >> >> >> >> >>> >>> Date: Thu, 1 Nov 2018 00:15:37 -0700 >>> From: Geoff Canyon >>> To: How to use LiveCode >>> Subject: Re: How to find the offset of the last instance of a >>> repeating character in a string? >>> >>> I was curious if using the itemDelimiter might work for this, so I wrote >>> the below code out of curiosity; but in my quick testing with single-byte >>> characters it was only about 30% faster than the above methods, so I >> didn't >>> bother to post it. >>> >>> But Ben Rubinstein just posted about a terrible slow-down doing pretty >> much >>> this same thing for text with unicode characters. So I ran a simple test >>> with 8000 character long strings that start with a single unicode >>> character, this is about 15x faster than offset() with skip. For >>> 100,000-character lines it's about 300x faster, so it seems to be immune >> to >>> the line-painter issues skip is subject to. So for what it's worth: >>> >>> function offsetList D,S >>> -- returns a comma-delimited list of the offsets of D in S >>> set the itemDel to D >>> repeat for each item i in S >>> add length(i) + 1 to C >>> put C,"" after R >>> end repeat >>> set the itemDel to comma >>> if char -1 of S is D then return char 1 to -2 of R >>> put length(C) + 1 into lenC >>> put length(R) into lenR >>> if lenC = lenR then return 0 >>> return char 1 to lenR - lenC - 1 of R >>> end offsetList >>> >> >> >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your >> subscription preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode >> > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Nice! I *just* finished creating a github repository for it, and adding support for multi-char search strings, much as you did. I was coming to the list to post the update when I saw your post. Here's the GitHub link: https://github.com/gcanyon/offsetlist Here's my updated version: function offsetList D,S,pCase -- returns a comma-delimited list of the offsets of D in S set the caseSensitive to pCase is true set the itemDel to D put length(D) into dLength put 1 - dLength into C repeat for each item i in S add length(i) + dLength to C put C,"" after R end repeat set the itemDel to comma if char -dLength to -1 of S is D then return char 1 to -2 of R put length(C) + 1 into lenC put length(R) into lenR if lenC = lenR then return 0 return char 1 to lenR - lenC - 1 of R end offsetList On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode < use-livecode@lists.runrev.com> wrote: > Hi Geoff, > > thank you for this beautiful script. > > I modified it a bit to accept multi-character search string and also for > case sensitivity. > > It definitely is a lot faster for unicode text than anything I have seen. > > - > function offsetList D,S, pCase >-- returns a comma-delimited list of the offsets of D in S >-- pCase is a boolean for caseSensitive >set the caseSensitive to pCase >set the itemDel to D >put the length of D into tDelimLength >repeat for each item i in S > add length(i) + tDelimLength to C > put C - (tDelimLength - 1),"" after R >end repeat >set the itemDel to comma >if char -1 of S is D then return char 1 to -2 of R >put length(C) + 1 into lenC >put length(R) into lenR >if lenC = lenR then return 0 >return char 1 to lenR - lenC - 1 of R > end offsetList > -- > > Kind regards > Bernd > > > > > > > > > Date: Thu, 1 Nov 2018 00:15:37 -0700 > > From: Geoff Canyon > > To: How to use LiveCode > > Subject: Re: How to find the offset of the last instance of a > > repeating character in a string? > > > > I was curious if using the itemDelimiter might work for this, so I wrote > > the below code out of curiosity; but in my quick testing with single-byte > > characters it was only about 30% faster than the above methods, so I > didn't > > bother to post it. > > > > But Ben Rubinstein just posted about a terrible slow-down doing pretty > much > > this same thing for text with unicode characters. So I ran a simple test > > with 8000 character long strings that start with a single unicode > > character, this is about 15x faster than offset() with skip. For > > 100,000-character lines it's about 300x faster, so it seems to be immune > to > > the line-painter issues skip is subject to. So for what it's worth: > > > > function offsetList D,S > > -- returns a comma-delimited list of the offsets of D in S > > set the itemDel to D > > repeat for each item i in S > > add length(i) + 1 to C > > put C,"" after R > > end repeat > > set the itemDel to comma > > if char -1 of S is D then return char 1 to -2 of R > > put length(C) + 1 into lenC > > put length(R) into lenR > > if lenC = lenR then return 0 > > return char 1 to lenR - lenC - 1 of R > > end offsetList > > > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Hi Geoff, thank you for this beautiful script. I modified it a bit to accept multi-character search string and also for case sensitivity. It definitely is a lot faster for unicode text than anything I have seen. - function offsetList D,S, pCase -- returns a comma-delimited list of the offsets of D in S -- pCase is a boolean for caseSensitive set the caseSensitive to pCase set the itemDel to D put the length of D into tDelimLength repeat for each item i in S add length(i) + tDelimLength to C put C - (tDelimLength - 1),"" after R end repeat set the itemDel to comma if char -1 of S is D then return char 1 to -2 of R put length(C) + 1 into lenC put length(R) into lenR if lenC = lenR then return 0 return char 1 to lenR - lenC - 1 of R end offsetList -- Kind regards Bernd > > Date: Thu, 1 Nov 2018 00:15:37 -0700 > From: Geoff Canyon > To: How to use LiveCode > Subject: Re: How to find the offset of the last instance of a > repeating character in a string? > > I was curious if using the itemDelimiter might work for this, so I wrote > the below code out of curiosity; but in my quick testing with single-byte > characters it was only about 30% faster than the above methods, so I didn't > bother to post it. > > But Ben Rubinstein just posted about a terrible slow-down doing pretty much > this same thing for text with unicode characters. So I ran a simple test > with 8000 character long strings that start with a single unicode > character, this is about 15x faster than offset() with skip. For > 100,000-character lines it's about 300x faster, so it seems to be immune to > the line-painter issues skip is subject to. So for what it's worth: > > function offsetList D,S > -- returns a comma-delimited list of the offsets of D in S > set the itemDel to D > repeat for each item i in S > add length(i) + 1 to C > put C,"" after R > end repeat > set the itemDel to comma > if char -1 of S is D then return char 1 to -2 of R > put length(C) + 1 into lenC > put length(R) into lenR > if lenC = lenR then return 0 > return char 1 to lenR - lenC - 1 of R > end offsetList > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string?
I was curious if using the itemDelimiter might work for this, so I wrote the below code out of curiosity; but in my quick testing with single-byte characters it was only about 30% faster than the above methods, so I didn't bother to post it. But Ben Rubinstein just posted about a terrible slow-down doing pretty much this same thing for text with unicode characters. So I ran a simple test with 8000 character long strings that start with a single unicode character, this is about 15x faster than offset() with skip. For 100,000-character lines it's about 300x faster, so it seems to be immune to the line-painter issues skip is subject to. So for what it's worth: function offsetList D,S -- returns a comma-delimited list of the offsets of D in S set the itemDel to D repeat for each item i in S add length(i) + 1 to C put C,"" after R end repeat set the itemDel to comma if char -1 of S is D then return char 1 to -2 of R put length(C) + 1 into lenC put length(R) into lenR if lenC = lenR then return 0 return char 1 to lenR - lenC - 1 of R end offsetList > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string?
On Tue, Oct 30, 2018 at 2:33 AM Keith Clarke via use-livecode wrote: > > I’m trying to separate paths & pages from a list of URLs and so looking to > identify the position of the last ‘/‘ character. > If that is all you are after then I think setting the itemDelimiter to "/" and separating the 'item -1' (page) from 'items 1 to -2' (path) would give you a very simple a readable solution. The only problem is if you have the unlikely but not impossible situation where you have paths that contain no pages. Because of the known gotcha with LC and how it counts items when the last item is empty you may need to include and 'if' statement. Try this, create a new Stack with a field and a button. Into the field load the following text: https://www.my.org/assets/general/february/ https://www.my.org/assets/general/march/ https://www.my.org/assets/general/april/2018.zip https://www.my.org/assets/general/may/2018.zip https://www.my.org/assets/general/june/2018.zip https://www.my.org/assets/general/july/2018.zip https://www.my.org/assets/general/july/2017.html https://www.my.org/assets/general/july/2016.text https://www.my.org/assets/general/july/2015.jpg https://www.my.org/assets/general/august/2018.zip https://www.my.org/assets/general/september/2018.zip https://www.my.org/assets/general/october/2018.zip https://www.my.org/assets/general/november/ https://www.my.org/assets/general/december/ Into the button load the following script (be careful of line breaks there are 16 lines of code): on mouseUp put fld 1 into tText set the itemDelimiter to "/" repeat for each line tLine in tText if (char -1 of tLine = "/") then --usual problem with dealing with empty last items put empty into tPath[tLine] else if (tPath[item 1 to -2 of tLine] = empty) then --initial entry put item -1 of tLine into tPath[item 1 to -2 of tLine] else --multiple entries put tPath[item 1 to -2 of tLine] & cr & item -1 of tLine into tPath[item 1 to -2 of tLine] end if end if end repeat breakpoint end mouseUp There is breakpoint at the end so the script will pause and you can inspect the variables. You'll see that an array is created with each unique path as a key and each page its element. In the case of 'july' you will see that four pages are all listed, one per line. From there it should open a world of possibilities to arrange, sort and sift through the paths. HTH ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string?
On 29/10/2018 22:32, Mark Wieder via use-livecode wrote: On 10/29/2018 08:32 AM, Keith Clarke via use-livecode wrote: I’m trying to separate paths & pages from a list of URLs and so looking to identify the position of the last ‘/‘ character. How about function rightmostSlashOf p set the itemdelimiter to "/" return (thenumberofcharsinp) - (thenumberofcharsinitem-1 ofp) end rightmostSlashOf ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string?
"toplevel/somename/another/somename" On 29/10/2018 22:32, Mark Wieder via use-livecode wrote: On 10/29/2018 08:32 AM, Keith Clarke via use-livecode wrote: I’m trying to separate paths & pages from a list of URLs and so looking to identify the position of the last ‘/‘ character. function rightmostSlashOf pText set the itemdelimiter to "/" return offset(item -1 of pText, pText) end rightmostSlashOf ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string?
Oh right you are! Bob S > On Oct 29, 2018, at 16:04 , Mark Wieder via use-livecode > wrote: > > On 10/29/2018 03:55 PM, Bob Sneidar via use-livecode wrote: >> That will only give him the item, not the character position. > > Nope. It returns the position. > > -- > Mark Wieder > ahsoftw...@gmail.com > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string?
On 10/29/2018 03:55 PM, Bob Sneidar via use-livecode wrote: That will only give him the item, not the character position. Nope. It returns the position. -- Mark Wieder ahsoftw...@gmail.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string?
That will only give him the item, not the character position. But it's a start. You can now get the number of characters of item 1 to -2 of pText +1. I didn't know the text you were searching had regular delimiters, and you were searching for the last delimiter. That makes things *much* easier. Bob S > On Oct 29, 2018, at 15:32 , Mark Wieder via use-livecode > wrote: > > On 10/29/2018 08:32 AM, Keith Clarke via use-livecode wrote: > >> I’m trying to separate paths & pages from a list of URLs and so looking to >> identify the position of the last ‘/‘ character. > > function rightmostSlashOf pText > set the itemdelimiter to "/" > return offset(item -1 of pText, pText) > end rightmostSlashOf > > -- > Mark Wieder > ahsoftw...@gmail.com > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string?
On 10/29/2018 08:32 AM, Keith Clarke via use-livecode wrote: I’m trying to separate paths & pages from a list of URLs and so looking to identify the position of the last ‘/‘ character. function rightmostSlashOf pText set the itemdelimiter to "/" return offset(item -1 of pText, pText) end rightmostSlashOf -- Mark Wieder ahsoftw...@gmail.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string?
In dBase/Foxpro they had an AT function synonymous (roughly) with our offset function. They also had a RAT (Reverse AT) function. I needed something like this many moons ago. What I did to get all occurrences is I have a "pointer" variable I maintain with the position of the first character after the last instance of the string found. But to get the actual position in the original text, you have to add the pointer to the offset like so: put 0 into tPointer repeat put offset(tVar, tTextChunk, tPointer) into tNextPos if tNextPos = 0 then exit repeat add tPointer to tNextPos put char tNextPos to tNextPos + length(tVar) of tTextChunk into aFoundChunks [tNextPos] [length(tVar)] put tNextPos + length(tVar) +1 into tPointer end repeat Something along those lines. Not tested, but you get the idea. Bob S > On Oct 29, 2018, at 08:32 , Keith Clarke via use-livecode > wrote: > > Folks, > Is there a simple way to find the offset of a character from the ‘right’ end > of a string, rather than the beginning - or alternatively get a list of all > occurrences? > > I’m trying to separate paths & pages from a list of URLs and so looking to > identify the position of the last ‘/‘ character. > > Thanks & regards, > Keith > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string?
Looks like Devin beat me to it. :-) Bob S > On Oct 29, 2018, at 08:49 , Devin Asay via use-livecode > wrote: > > On Oct 29, 2018, at 9:32 AM, Keith Clarke via use-livecode > wrote: >> >> Folks, >> Is there a simple way to find the offset of a character from the ‘right’ end >> of a string, rather than the beginning - or alternatively get a list of all >> occurrences? >> >> I’m trying to separate paths & pages from a list of URLs and so looking to >> identify the position of the last ‘/‘ character. >> >> Thanks & regards, >> Keith > > > There was a discussion on this topic on the list a few years ago, and I saved > these functions in my script library: > > From Peter Brigham: > These are utility functions I use constantly for text processing. > Offsets(str,cntr) returns a comma-delimited list of all the offsets of str in > ctnr. Lineoffsets(str,cntr) does the same with lineoffsets. Then you can > interate over the list of offsets to do whatever you want to each instance of > str in cntr. I keep them in a utility stack that is in the stackinuse, so it > is available to all stacks. I don't use regex, as I have never gotten the > regex syntax to stick in my head firmly enough to find it natural, and in any > case doing it by script turns out to be as fast or faster. > > Peter's lineOffsets function returns a line number for each found char > offset. I added a function that returns only unique line numbers. > > function offsets str,cntr >-- returns a comma-delimited list of >-- all the offsets of str in cntr >put "" into oList >put 0 into startPoint >repeat >put offset(str,cntr,startPoint) into os >if os = 0 then exit repeat >add os to startPoint >put startPoint & "," after oList >end repeat >if oList = "" then return "0" >return item 1 to -1 of oList > end offsets > > function lineOffsetsAll str,cntr >-- returns a comma-delimited list of >-- all the lineoffsets of str in cntr ># (returns a line number for ALL instances) >put offsets(str,cntr) into charList >if charList = "0" then return "0" >put the number of items of charList into nbr >put "" into oList >repeat for each item n in charList >put the number of lines of (char 1 to n of cntr) \ >& "," after oList >end repeat >return item 1 to -1 of oList > end lineOffsetsAll > > # added by Devin Asay > function lineOffsets pStr,pSearchTxt ># (returns only unique line numbers) >put empty into tList >put 0 into tStartLine >repeat >put lineOffset(pStr,pSearchTxt,tStartLine) into tLineNum >if tLineNum = 0 then exit repeat >add tLineNum to tStartLine >put tStartLine & "," after tList >end repeat >if tList is empty then return "0" >return item 1 to -1 of tList > end lineOffsets > > Hope this helps. > > Devin > > Devin Asay > Director > Office of Digital Humanities > Brigham Young University > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string?
Perfect, thanks Devin - I was hoping to see ‘offsets’ in the docs under ‘offset’, so this will do nicely! :-) Best, Keith > On 29 Oct 2018, at 15:49, Devin Asay via use-livecode > wrote: > > On Oct 29, 2018, at 9:32 AM, Keith Clarke via use-livecode > wrote: >> >> Folks, >> Is there a simple way to find the offset of a character from the ‘right’ end >> of a string, rather than the beginning - or alternatively get a list of all >> occurrences? >> >> I’m trying to separate paths & pages from a list of URLs and so looking to >> identify the position of the last ‘/‘ character. >> >> Thanks & regards, >> Keith > > > There was a discussion on this topic on the list a few years ago, and I saved > these functions in my script library: > > From Peter Brigham: > These are utility functions I use constantly for text processing. > Offsets(str,cntr) returns a comma-delimited list of all the offsets of str in > ctnr. Lineoffsets(str,cntr) does the same with lineoffsets. Then you can > interate over the list of offsets to do whatever you want to each instance of > str in cntr. I keep them in a utility stack that is in the stackinuse, so it > is available to all stacks. I don't use regex, as I have never gotten the > regex syntax to stick in my head firmly enough to find it natural, and in any > case doing it by script turns out to be as fast or faster. > > Peter's lineOffsets function returns a line number for each found char > offset. I added a function that returns only unique line numbers. > > function offsets str,cntr >-- returns a comma-delimited list of >-- all the offsets of str in cntr >put "" into oList >put 0 into startPoint >repeat >put offset(str,cntr,startPoint) into os >if os = 0 then exit repeat >add os to startPoint >put startPoint & "," after oList >end repeat >if oList = "" then return "0" >return item 1 to -1 of oList > end offsets > > function lineOffsetsAll str,cntr >-- returns a comma-delimited list of >-- all the lineoffsets of str in cntr ># (returns a line number for ALL instances) >put offsets(str,cntr) into charList >if charList = "0" then return "0" >put the number of items of charList into nbr >put "" into oList >repeat for each item n in charList >put the number of lines of (char 1 to n of cntr) \ >& "," after oList >end repeat >return item 1 to -1 of oList > end lineOffsetsAll > > # added by Devin Asay > function lineOffsets pStr,pSearchTxt ># (returns only unique line numbers) >put empty into tList >put 0 into tStartLine >repeat >put lineOffset(pStr,pSearchTxt,tStartLine) into tLineNum >if tLineNum = 0 then exit repeat >add tLineNum to tStartLine >put tStartLine & "," after tList >end repeat >if tList is empty then return "0" >return item 1 to -1 of tList > end lineOffsets > > Hope this helps. > > Devin > > Devin Asay > Director > Office of Digital Humanities > Brigham Young University > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to find the offset of the last instance of a repeating character in a string?
On Oct 29, 2018, at 9:32 AM, Keith Clarke via use-livecode wrote: > > Folks, > Is there a simple way to find the offset of a character from the ‘right’ end > of a string, rather than the beginning - or alternatively get a list of all > occurrences? > > I’m trying to separate paths & pages from a list of URLs and so looking to > identify the position of the last ‘/‘ character. > > Thanks & regards, > Keith There was a discussion on this topic on the list a few years ago, and I saved these functions in my script library: From Peter Brigham: These are utility functions I use constantly for text processing. Offsets(str,cntr) returns a comma-delimited list of all the offsets of str in ctnr. Lineoffsets(str,cntr) does the same with lineoffsets. Then you can interate over the list of offsets to do whatever you want to each instance of str in cntr. I keep them in a utility stack that is in the stackinuse, so it is available to all stacks. I don't use regex, as I have never gotten the regex syntax to stick in my head firmly enough to find it natural, and in any case doing it by script turns out to be as fast or faster. Peter's lineOffsets function returns a line number for each found char offset. I added a function that returns only unique line numbers. function offsets str,cntr -- returns a comma-delimited list of -- all the offsets of str in cntr put "" into oList put 0 into startPoint repeat put offset(str,cntr,startPoint) into os if os = 0 then exit repeat add os to startPoint put startPoint & "," after oList end repeat if oList = "" then return "0" return item 1 to -1 of oList end offsets function lineOffsetsAll str,cntr -- returns a comma-delimited list of -- all the lineoffsets of str in cntr # (returns a line number for ALL instances) put offsets(str,cntr) into charList if charList = "0" then return "0" put the number of items of charList into nbr put "" into oList repeat for each item n in charList put the number of lines of (char 1 to n of cntr) \ & "," after oList end repeat return item 1 to -1 of oList end lineOffsetsAll # added by Devin Asay function lineOffsets pStr,pSearchTxt # (returns only unique line numbers) put empty into tList put 0 into tStartLine repeat put lineOffset(pStr,pSearchTxt,tStartLine) into tLineNum if tLineNum = 0 then exit repeat add tLineNum to tStartLine put tStartLine & "," after tList end repeat if tList is empty then return "0" return item 1 to -1 of tList end lineOffsets Hope this helps. Devin Devin Asay Director Office of Digital Humanities Brigham Young University ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
How to find the offset of the last instance of a repeating character in a string?
Folks, Is there a simple way to find the offset of a character from the ‘right’ end of a string, rather than the beginning - or alternatively get a list of all occurrences? I’m trying to separate paths & pages from a list of URLs and so looking to identify the position of the last ‘/‘ character. Thanks & regards, Keith ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode