Re: Speaking of Filter and Match...
I just ran a speed test on Dick Kriesel’s method for removing duplicates and keeping the original sort order. It is indeed faster than Jacque’s method Sorting 11000 lines of text including 1000 duplicated lines: Jacque’s method: 2.807 seconds [lineoffset is really slow, and in effect the text is scanned 3 times] Kriesel method: 0.017 seconds Bu what really made my eyes pop — unless I am doing something wrong, it is faster than using the accepted wisdom of just using “split by cr as set; put the keys” method of removing duplicates and leaving the text unsorted Split as set, don’t care about sort order method: 0.024 seconds I can’t see anything wrong with my code, but can someone independently confirm this?? Neville ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Speaking of Filter and Match...
Good idea, thanks Dick. Your scripts are always so elegant. -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com On March 14, 2022 7:11:13 PM Dick Kriesel via use-livecode wrote: Since order must be maintained, it’s probably faster not to split and sort, and faster not to scan the list repeatedly using lineOffset or contains. You could do it like this: command removeDuplicates pDelimitedList, pDelimiter local tArray, tList set the lineDelimiter to pDelimiter repeat for each line tLine in pDelimitedList if not tArray[tLine] then -- i.e., if this line hasn't appeared already, then ... put true into tArray[tLine] put tLine & pDelimiter after tList end if end repeat delete last char of tList return tList for value end removeDuplicates — Dick ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Speaking of Filter and Match...
Hi Dick Thank you so much for your time in sending me this solution. I’ve already learned a lot and I have yet to actually play with it! Cheers, Roger > On Mar 14, 2022, at 5:08 PM, Dick Kriesel via use-livecode > wrote: > > > >> On Mar 13, 2022, at 1:05 PM, J. Landman Gay via use-livecode >> wrote: >> >> On 3/12/22 8:54 PM, Roger Guay via use-livecode wrote: >>> I have a field with about a thousand lines with many duplicate lines, and I >>> want to delete the duplicates. Seems like this should be simple but I am >>> running around in circles. Can anyone help me with this? >> >> Making the list into an array is the easiest way but as mentioned, it will >> destroy the original order. If the order is important then you can restore >> it with a custom sort function... >> > > > Since order must be maintained, it’s probably faster not to split and sort, > and faster not to scan the list repeatedly using lineOffset or contains. > You could do it like this: > > command removeDuplicates pDelimitedList, pDelimiter > local tArray, tList > set the lineDelimiter to pDelimiter > repeat for each line tLine in pDelimitedList > if not tArray[tLine] then -- i.e., if this line hasn't appeared already, > then ... > put true into tArray[tLine] > put tLine & pDelimiter after tList > end if > end repeat > delete last char of tList > return tList for value > end removeDuplicates > > — Dick > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Speaking of Filter and Match...
> On Mar 13, 2022, at 1:05 PM, J. Landman Gay via use-livecode > wrote: > > On 3/12/22 8:54 PM, Roger Guay via use-livecode wrote: >> I have a field with about a thousand lines with many duplicate lines, and I >> want to delete the duplicates. Seems like this should be simple but I am >> running around in circles. Can anyone help me with this? > > Making the list into an array is the easiest way but as mentioned, it will > destroy the original order. If the order is important then you can restore it > with a custom sort function... > Since order must be maintained, it’s probably faster not to split and sort, and faster not to scan the list repeatedly using lineOffset or contains. You could do it like this: command removeDuplicates pDelimitedList, pDelimiter local tArray, tList set the lineDelimiter to pDelimiter repeat for each line tLine in pDelimitedList if not tArray[tLine] then -- i.e., if this line hasn't appeared already, then ... put true into tArray[tLine] put tLine & pDelimiter after tList end if end repeat delete last char of tList return tList for value end removeDuplicates — Dick ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Speaking of Filter and Match...
Ah, I see. Thank you again, Bob. Roger > On Mar 14, 2022, at 2:37 PM, Bob Sneidar via use-livecode > wrote: > > The UNIQUE clause is the UNIQUE combination of ALL the columns put together. > If I used: > > SELECT city,state UNIQUE FROM zip codes where state = 'CA' > > I would get every unique city/state combination in CA, whereas if I used: > > SELECT state UNIQUE from zip codes where state = 'CA' > > I would get the first record matching 'CA', that is one record. > > There is a way to get the city and state for the one record (why anyone would > want to I don't know) by creating a join to the same table and using the > UNIQUE clause in the join. I am not that good at join syntax, so I won't > attempt it here and embarrass myself. :-) > > BTW you can get the last matching record by doing an ascending sort and using > LIMIT 1, but I think MS SQL suffers from not having a limit clause. Not sure > why. Instead you use the TOP clause. > > Bob S > > >> On Mar 14, 2022, at 12:14 , Roger Guay via use-livecode >> wrote: >> >>> Actually I must correct myself. That will not work because the unique value >>> column (typically an autoincrementing integer) will not be unique for each >>> record. Instead, assuming your lines of text are in a column called >>> "textdata" >>> >>> SELECT textdata UNIQUE FROM... >>> >>> Bob S > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Speaking of Filter and Match...
The UNIQUE clause is the UNIQUE combination of ALL the columns put together. If I used: SELECT city,state UNIQUE FROM zip codes where state = 'CA' I would get every unique city/state combination in CA, whereas if I used: SELECT state UNIQUE from zip codes where state = 'CA' I would get the first record matching 'CA', that is one record. There is a way to get the city and state for the one record (why anyone would want to I don't know) by creating a join to the same table and using the UNIQUE clause in the join. I am not that good at join syntax, so I won't attempt it here and embarrass myself. :-) BTW you can get the last matching record by doing an ascending sort and using LIMIT 1, but I think MS SQL suffers from not having a limit clause. Not sure why. Instead you use the TOP clause. Bob S > On Mar 14, 2022, at 12:14 , Roger Guay via use-livecode > wrote: > >> Actually I must correct myself. That will not work because the unique value >> column (typically an autoincrementing integer) will not be unique for each >> record. Instead, assuming your lines of text are in a column called >> "textdata" >> >> SELECT textdata UNIQUE FROM... >> >> Bob S ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Speaking of Filter and Match...
Thanks ver much for your clarifications, Bob although I’m not sure I understand your correction. Roger > On Mar 14, 2022, at 8:48 AM, Bob Sneidar via use-livecode > wrote: > > Actually I must correct myself. That will not work because the unique value > column (typically an autoincrementing integer) will not be unique for each > record. Instead, assuming your lines of text are in a column called > "textdata" > > SELECT textdata UNIQUE FROM... > > Bob S > > >> On Mar 14, 2022, at 08:29 , Bob Sneidar via use-livecode >> wrote: >> >> They depend on the fact that arrays cannot have duplicate keys. Dumping the >> data into an SQL database and querying using the UNIQUE statement would do >> it too. >> >> SELECT * UNIQUE from ... >> >> Bob S >> >> >> >>> On Mar 13, 2022, at 13:16 , Roger Guay via use-livecode >>> wrote: >>> >>> Thank you Jacqueline, Alex and Terry. Very interesting new (for me) methods >>> that I would never have come up with on my own. >>> >>> Roger >>> On Mar 13, 2022, at 1:05 PM, J. Landman Gay via use-livecode wrote: On 3/12/22 8:54 PM, Roger Guay via use-livecode wrote: > I have a field with about a thousand lines with many duplicate lines, and > I want to delete the duplicates. Seems like this should be simple but I > am running around in circles. Can anyone help me with this? Making the list into an array is the easiest way but as mentioned, it will destroy the original order. If the order is important then you can restore it with a custom sort function. Here's my test handlers: on mouseUp put fld 1 into tData -- we keep this as a reference to the original order put tData into tTrimmedData -- this one will change split tTrimmedData by cr as set -- removes duplicates put keys(tTrimmedData) into tTrimmedData -- convert to a text list sort tTrimmedData numeric by origOrder(each,tData) put tTrimmedData into fld 1 end mouseUp function origOrder pWord, @pData set wholematches to true -- may not matter, depends on the data return lineoffset(pWord, pData) end origOrder Field 1 contains lines in random order with duplicates. -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode >>> >>> >>> ___ >>> use-livecode mailing list >>> use-livecode@lists.runrev.com >>> Please visit this url to subscribe, unsubscribe and manage your >>> subscription preferences: >>> http://lists.runrev.com/mailman/listinfo/use-livecode >> >> >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Speaking of Filter and Match...
Actually I must correct myself. That will not work because the unique value column (typically an autoincrementing integer) will not be unique for each record. Instead, assuming your lines of text are in a column called "textdata" SELECT textdata UNIQUE FROM... Bob S > On Mar 14, 2022, at 08:29 , Bob Sneidar via use-livecode > wrote: > > They depend on the fact that arrays cannot have duplicate keys. Dumping the > data into an SQL database and querying using the UNIQUE statement would do it > too. > > SELECT * UNIQUE from ... > > Bob S > > > >> On Mar 13, 2022, at 13:16 , Roger Guay via use-livecode >> wrote: >> >> Thank you Jacqueline, Alex and Terry. Very interesting new (for me) methods >> that I would never have come up with on my own. >> >> Roger >> >>> On Mar 13, 2022, at 1:05 PM, J. Landman Gay via use-livecode >>> wrote: >>> >>> On 3/12/22 8:54 PM, Roger Guay via use-livecode wrote: I have a field with about a thousand lines with many duplicate lines, and I want to delete the duplicates. Seems like this should be simple but I am running around in circles. Can anyone help me with this? >>> >>> Making the list into an array is the easiest way but as mentioned, it will >>> destroy the original order. If the order is important then you can restore >>> it with a custom sort function. Here's my test handlers: >>> >>> >>> on mouseUp >>> put fld 1 into tData -- we keep this as a reference to the original order >>> put tData into tTrimmedData -- this one will change >>> split tTrimmedData by cr as set -- removes duplicates >>> put keys(tTrimmedData) into tTrimmedData -- convert to a text list >>> sort tTrimmedData numeric by origOrder(each,tData) >>> put tTrimmedData into fld 1 >>> end mouseUp >>> >>> function origOrder pWord, @pData >>> set wholematches to true -- may not matter, depends on the data >>> return lineoffset(pWord, pData) >>> end origOrder >>> >>> Field 1 contains lines in random order with duplicates. >>> >>> -- >>> Jacqueline Landman Gay | jac...@hyperactivesw.com >>> HyperActive Software | http://www.hyperactivesw.com >>> >>> ___ >>> use-livecode mailing list >>> use-livecode@lists.runrev.com >>> Please visit this url to subscribe, unsubscribe and manage your >>> subscription preferences: >>> http://lists.runrev.com/mailman/listinfo/use-livecode >> >> >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Speaking of Filter and Match...
They depend on the fact that arrays cannot have duplicate keys. Dumping the data into an SQL database and querying using the UNIQUE statement would do it too. SELECT * UNIQUE from ... Bob S > On Mar 13, 2022, at 13:16 , Roger Guay via use-livecode > wrote: > > Thank you Jacqueline, Alex and Terry. Very interesting new (for me) methods > that I would never have come up with on my own. > > Roger > >> On Mar 13, 2022, at 1:05 PM, J. Landman Gay via use-livecode >> wrote: >> >> On 3/12/22 8:54 PM, Roger Guay via use-livecode wrote: >>> I have a field with about a thousand lines with many duplicate lines, and I >>> want to delete the duplicates. Seems like this should be simple but I am >>> running around in circles. Can anyone help me with this? >> >> Making the list into an array is the easiest way but as mentioned, it will >> destroy the original order. If the order is important then you can restore >> it with a custom sort function. Here's my test handlers: >> >> >> on mouseUp >> put fld 1 into tData -- we keep this as a reference to the original order >> put tData into tTrimmedData -- this one will change >> split tTrimmedData by cr as set -- removes duplicates >> put keys(tTrimmedData) into tTrimmedData -- convert to a text list >> sort tTrimmedData numeric by origOrder(each,tData) >> put tTrimmedData into fld 1 >> end mouseUp >> >> function origOrder pWord, @pData >> set wholematches to true -- may not matter, depends on the data >> return lineoffset(pWord, pData) >> end origOrder >> >> Field 1 contains lines in random order with duplicates. >> >> -- >> Jacqueline Landman Gay | jac...@hyperactivesw.com >> HyperActive Software | http://www.hyperactivesw.com >> >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Speaking of Filter and Match...
Thank you Jacqueline, Alex and Terry. Very interesting new (for me) methods that I would never have come up with on my own. Roger > On Mar 13, 2022, at 1:05 PM, J. Landman Gay via use-livecode > wrote: > > On 3/12/22 8:54 PM, Roger Guay via use-livecode wrote: >> I have a field with about a thousand lines with many duplicate lines, and I >> want to delete the duplicates. Seems like this should be simple but I am >> running around in circles. Can anyone help me with this? > > Making the list into an array is the easiest way but as mentioned, it will > destroy the original order. If the order is important then you can restore it > with a custom sort function. Here's my test handlers: > > > on mouseUp > put fld 1 into tData -- we keep this as a reference to the original order > put tData into tTrimmedData -- this one will change > split tTrimmedData by cr as set -- removes duplicates > put keys(tTrimmedData) into tTrimmedData -- convert to a text list > sort tTrimmedData numeric by origOrder(each,tData) > put tTrimmedData into fld 1 > end mouseUp > > function origOrder pWord, @pData > set wholematches to true -- may not matter, depends on the data > return lineoffset(pWord, pData) > end origOrder > > Field 1 contains lines in random order with duplicates. > > -- > Jacqueline Landman Gay | jac...@hyperactivesw.com > HyperActive Software | http://www.hyperactivesw.com > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Speaking of Filter and Match...
On 3/12/22 8:54 PM, Roger Guay via use-livecode wrote: I have a field with about a thousand lines with many duplicate lines, and I want to delete the duplicates. Seems like this should be simple but I am running around in circles. Can anyone help me with this? Making the list into an array is the easiest way but as mentioned, it will destroy the original order. If the order is important then you can restore it with a custom sort function. Here's my test handlers: on mouseUp put fld 1 into tData -- we keep this as a reference to the original order put tData into tTrimmedData -- this one will change split tTrimmedData by cr as set -- removes duplicates put keys(tTrimmedData) into tTrimmedData -- convert to a text list sort tTrimmedData numeric by origOrder(each,tData) put tTrimmedData into fld 1 end mouseUp function origOrder pWord, @pData set wholematches to true -- may not matter, depends on the data return lineoffset(pWord, pData) end origOrder Field 1 contains lines in random order with duplicates. -- Jacqueline Landman Gay | jac...@hyperactivesw.com HyperActive Software | http://www.hyperactivesw.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Speaking of Filter and Match...
split tVar by CR combine tVar by CR If you don’t need to keep them in order. Alex Sent from my iPad > On 13 Mar 2022, at 04:14, Terry Judd via use-livecode > wrote: > > There are sure to be more elegant ways but you could just rebuild the list > skipping the duplicates as you go > > # tList1 contains original list > put cr into tList2 > repeat for each line x in tList1 >if tList2 contains cr&x&cr then # ensures you check whole not partial lines ># do nothing > else >put x&cr after tList2 >end if > end repeat > put char 2 to -2 of tList2 into tList2 # delete the leading and trailing > returns > > If you need to retain line specific formatting in the field though you’ll > need a different approach. > > Terry… > > Terry Judd | Senior Lecturer in Medical Education > Department of Medical > Education<https://medicine.unimelb.edu.au/school-structure/medical-education> > The University of Melbourne<https://www.unimelb.edu.au/> > M: 61-435 961 594 > E: terry.j...@unimelb.edu.au<mailto:terry.j...@unimelb.edu.au> > Publications<https://scholar.google.com/citations?user=XC5s6wwJ&hl=en> > > > From: use-livecode on behalf of Roger > Guay via use-livecode > Date: Sunday, 13 March 2022 at 1:55 pm > To: use-livecode@lists.runrev.com > Cc: Roger Guay > Subject: Speaking of Filter and Match... > I have a field with about a thousand lines with many duplicate lines, and I > want to delete the duplicates. Seems like this should be simple but I am > running around in circles. Can anyone help me with this? > > Thanks, > Roger > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Speaking of Filter and Match...
There are sure to be more elegant ways but you could just rebuild the list skipping the duplicates as you go # tList1 contains original list put cr into tList2 repeat for each line x in tList1 if tList2 contains cr&x&cr then # ensures you check whole not partial lines # do nothing else put x&cr after tList2 end if end repeat put char 2 to -2 of tList2 into tList2 # delete the leading and trailing returns If you need to retain line specific formatting in the field though you’ll need a different approach. Terry… Terry Judd | Senior Lecturer in Medical Education Department of Medical Education<https://medicine.unimelb.edu.au/school-structure/medical-education> The University of Melbourne<https://www.unimelb.edu.au/> M: 61-435 961 594 E: terry.j...@unimelb.edu.au<mailto:terry.j...@unimelb.edu.au> Publications<https://scholar.google.com/citations?user=XC5s6wwJ&hl=en> From: use-livecode on behalf of Roger Guay via use-livecode Date: Sunday, 13 March 2022 at 1:55 pm To: use-livecode@lists.runrev.com Cc: Roger Guay Subject: Speaking of Filter and Match... I have a field with about a thousand lines with many duplicate lines, and I want to delete the duplicates. Seems like this should be simple but I am running around in circles. Can anyone help me with this? Thanks, Roger ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Speaking of Filter and Match...
I have a field with about a thousand lines with many duplicate lines, and I want to delete the duplicates. Seems like this should be simple but I am running around in circles. Can anyone help me with this? Thanks, Roger ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode