Re: [racket-users] Re: Removing duplicates from a list while maintaining order

Michael Tiedtke Wed, 03 Jun 2015 22:09:08 -0700

Il giorno 03/giu/2015, alle ore 17.04, Paul Bian ha scritto:

> Thanks for all the help guys, it really helps with my understanding. 
> 
> So it seems to me there's no real easy way to do this while keeping all the 
> restrictions intact. 
> 
> I sat here thinking about it for quite a while before posting, as I thought I 
> missed some simple solution, as I'm inexperienced.
>




While developing with theoretical restrictions is good for practicing in 
practice I suggest to rely on high level list handling routines like sort, 
filter, map and for-each as these usually are optimized.  You might want to 
have a look at different implementations of SRFI 1 delete-duplicates. Or think 
about it in terms of sets with SRFI 113. As the problem is related to sorting 
you might want to have a look at the by now deprecated SRFI 32: its reference 
implementation includes heap sort, quick sort and perhaps others.

See http://docs.racket-lang.org/srfi/srfi-std/srfi-1.html#Deletion

delete-duplicates [=] -> list
"Be aware that, in general, delete-duplicates runs in time O(n2) for n-element 
lists. Uniquifying long lists can be accomplished in O(n lg n) time by sorting 
the list to bring equal elements together, then using a linear-time algorithm 
to remove equal elements. Alternatively, one can use algorithms based on 
element-marking, with linear-time results."


See http://srfi.schemers.org/srfi-113/srfi-113.html#Copyingandconversion

(list->set comparator list)
"Returns a newly allocated set, created as if by set using comparator, that 
contains the elements of list. Duplicate elements (in the sense of the equality 
predicate) are omitted."


http://srfi.schemers.org/srfi-95/srfi-95.html 
(http://srfi.schemers.org/srfi-32/srfi-32.txt)
"To choose optimal sort algorithms requires nearly as much understanding as to 
write them. Most users don't."

If you have "big data" with duplicates you should try to use a directed graph 
instead of well formed lists. It's fast and has a small memory footprint when 
there are many duplicates, i.e. a lot of redundancy in the data. Language 
contains a lot of redundancy (both at the word and at the letter level). This 
direction leads to searching (sth like the whole WWW in O(n)) and compression 
algorithms ...

Have fun,
Michael

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [racket-users] Re: Removing duplicates from a list while maintaining order

Reply via email to