Re: Looking for a set that sorts and deduplicates

2017-11-25 Thread alfrednewman
@Araq - B-Tree in the stdlib will be cool !! Thanks in advance !

Re: Looking for a set that sorts and deduplicates

2017-11-24 Thread cblake
Memory hierarchies since the 1990s have broadened the relevance of 1970s practical wisdom for disk storage. A solid hash table with locality and solid B-Tree often win today. A great stdlib should have both. A more obscure bonus of a B-Tree (when compared to, say, balanced binary trees) is chea

Re: Looking for a set that sorts and deduplicates

2017-11-24 Thread Udiknedormin
@cblake I entirely agree, B-trees are pretty awesome. When learning about Rust iterators I discovered that it's often faster to turn a container into a B-tree and then collect it into the starting container type than to operate directly on the initial container.

Re: Looking for a set that sorts and deduplicates

2017-11-23 Thread cblake
@Araq - fabulous. I think a B-Tree in the stdlib would be great. @mratsim - I think you meant `collections.intsets` with a 't' which does _not_ yield items() in key order. The built-in `set[up to int16]` type _does_ (implicitly) order its keys, though. @cdome also did not mention how large the

Re: Looking for a set that sorts and deduplicates

2017-11-23 Thread mratsim
`insets` dedups for sure and seems extremely fast. I don't know if it's items is sorted though.

Re: Looking for a set that sorts and deduplicates

2017-11-23 Thread Araq
I have a B-Tree implementation. Remind me to release it.

Re: Looking for a set that sorts and deduplicates

2017-11-23 Thread cblake
`CountTable` is just a histogram and the sorting is by the bin counts not by the keys. Keys are visited in "hash order". `hashes` does use an identity hash for integers which could create some confusion from a simple test program if the modulo the table size post hash transform doesn't change th

Re: Looking for a set that sorts and deduplicates

2017-11-23 Thread jlp765
`CountTable` from the `tables` library can also do this. I don't know how it compares speed-wise with the previous solution The `keys()` iterator will retrieve the sorted numbers (it sorts on insert??), and the value is the number of occurrences. If you already have a sequence of data, you can

Re: Looking for a set that sorts and deduplicates

2017-11-22 Thread cblake
`collections.heapqueue` does mostly what you want. You would have to remove duplicates yourself, like below, but probably with an iterator called something like `unique`. import collections.heapqueue var heap = newHeapQueue[int]() let data = [1, 3, 5, 1, 7, 9, 2, 4, 6,

Looking for a set that sorts and deduplicates

2017-11-22 Thread cdome
I have bunch of ints that I need to insert into set and read in increasing order avoiding duplicates. They are produced at a different time so I don't have them in a seq. I can probably knock-up a quick SortedSeq as a distinct seq in my code. I have a good capacity estimate for my sue case, so