Re: [swift-users] Collection underestimatedCount() does _?

Brent Royal-Gordon via swift-users Sat, 19 Mar 2016 12:54:33 -0700

>> I might have missed something since a lot of the results are for tests 
>> <https://github.com/apple/swift/search?utf8=✓&q=underestimatedCount&type=Code>
>> But why would `underestimatedCount` ever be used instead of `count`? Don’t 
>> most collections have O(1) `count` anyway?
> 
> Hi Will,
> 
> This API comes from Sequence, which does not have a count that you can
> inspect without possibly consuming the sequence.


To add some color:

A sequence merely says that it has a bunch of elements and you can walk through 
them. It does *not* promise several important things:

* That you can get the elements in the sequence more than once
* That it knows the number of elements in the sequence and can quickly return 
them
* That you can get the number of elements in the sequence without walking 
through and counting them one by one

This adds up to an ugly conclusion: With just a sequence, it is impossible to 
efficiently discover the count, and doing so might destroy the sequence.

That's obviously not so great. In particular, it's not so great when you want 
to load a sequence into an array. The simple, naive way is to do this:

        // Notionally
        struct Array<Element> {
                init<Seq: SequenceType where Seq.Generator.Element == 
Element>(_ seq: Seq) {
                        self.init()
                        
                        for elem in seq {
                                appendElement(elem)
                        }
                }
        }

But this may resize the array several times, which would be very slow. The 
fastest way to do that will be to pre-allocate the exact right number of 
elements so you don't have to keep resizing the array:

        // Notionally
        struct Array<Element> {
                init<Seq: SequenceType where Seq.Generator.Element == 
Element>(_ seq: Seq) {
                        self.init()
                        
                        reserveCapacity(seq.count)
                        for elem in seq {
                                
_appendElementQuicklyBecauseWeAlreadyKnowWeHaveTheCapacity(elem)
                        }
                }
        }

But `seq.count` might consume the sequence, leaving you unable to add any 
elements. What do we do?

Enter `underestimateCount()`. We can always call `underestimateCount()` and 
size to whatever it says we should be. For sequences with easily-calculated 
counts, this should give us a size that's just right. For sequences where they 
can kind of estimate the right count (for instance, if you're decoding a 
fixed-size buffer of UTF-8 bytes, and you know how many bytes there are but not 
exactly how many characters they'll decode to), it will get us at least part of 
the way there. For sequences whose size is completely unknown, it won't help 
but it won't hurt much either.

So you do this:

        // Notionally
        struct Array<Element> {
                init<Seq: SequenceType where Seq.Generator.Element == 
Element>(_ seq: Seq) {
                        self.init()
                        
                        let fastCount = seq.underestimateCount()
                        reserveCapacity(fastCount)
                        
                        // We'll have to pull elements out manually.
                        let generator = seq.generate()
                        
                        // Load all the fast elements
                        for _ in 0..<fastCount {
                                let elem = generator.next()!
                                
_appendElementQuicklyBecauseWeAlreadyKnowWeHaveTheCapacity(elem)
                        }
                        
                        // Get anything else that's left
                        while let elem = generator.next() {
                                appendElement(elem)
                        }
                }
        }

(If you're looking at Swift 3, remember: SequenceType is now Sequence, 
Generator is now Iterator, generate() is now makeIterator(), and 
underestimateCount() is now underestimatedCount().)

And in fact, this is quite similar to some real source code inside Array:

        @warn_unused_result
        internal func _copySequenceToNativeArrayBuffer<
          S : SequenceType
        >(source: S) -> _ContiguousArrayBuffer<S.Generator.Element> {
          let initialCapacity = source.underestimateCount()
          var builder =
            
_UnsafePartiallyInitializedContiguousArrayBuffer<S.Generator.Element>(
              initialCapacity: initialCapacity)
        
          var generator = source.generate()
        
          // FIXME(performance): use _initializeTo().
        
          // Add elements up to the initial capacity without checking for 
regrowth.
          for _ in 0..<initialCapacity {
            builder.addWithExistingCapacity(generator.next()!)
          }
        
          // Add remaining elements, if any.
          while let element = generator.next() {
            builder.add(element)
          }
        
          return builder.finish()
        }

Now, collections *do* promise that you can walk through a sequence more than 
once, and some collections (ones that promise "random access") promise that 
`count` can be calculated quickly. But because CollectionType conforms to 
SequenceType, some code may receive a collection but only know it's a sequence. 
Fortunately, the standard library provides a default implementation of 
`underestimateCount` for collections:

          public func underestimateCount() -> Int {
            return numericCast(count)
          }

So you just don't have to worry about this on collections. Implement `count` 
and `understimateCount` will take care of itself.

HTH,
-- 
Brent Royal-Gordon
Architechies

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

Re: [swift-users] Collection underestimatedCount() does _?

Reply via email to