On Mar 19, 2016, at 2:53 PM, Brent Royal-Gordon via swift-users 
<swift-users@swift.org> wrote:

>>> I might have missed something since a lot of the results are for tests 
>>> <https://github.com/apple/swift/search?utf8=✓&q=underestimatedCount&type=Code>
>>> But why would `underestimatedCount` ever be used instead of `count`? Don’t 
>>> most collections have O(1) `count` anyway?
>> 
>> Hi Will,
>> 
>> This API comes from Sequence, which does not have a count that you can
>> inspect without possibly consuming the sequence.
> 
> To add some color:
> 
> A sequence merely says that it has a bunch of elements and you can walk 
> through them. It does *not* promise several important things:
> 
> * That you can get the elements in the sequence more than once
> * That it knows the number of elements in the sequence and can quickly return 
> them
> * That you can get the number of elements in the sequence without walking 
> through and counting them one by one
> 
> This adds up to an ugly conclusion: With just a sequence, it is impossible to 
> efficiently discover the count, and doing so might destroy the sequence.
> 
> That's obviously not so great. In particular, it's not so great when you want 
> to load a sequence into an array. The simple, naive way is to do this:
> 
>    // Notionally
>    struct Array<Element> {
>        init<Seq: SequenceType where Seq.Generator.Element == Element>(_ seq: 
> Seq) {
>            self.init()
>            
>            for elem in seq {
>                appendElement(elem)
>            }
>        }
>    }
> 
> But this may resize the array several times, which would be very slow. The 
> fastest way to do that will be to pre-allocate the exact right number of 
> elements so you don't have to keep resizing the array:
> 
>    // Notionally
>    struct Array<Element> {
>        init<Seq: SequenceType where Seq.Generator.Element == Element>(_ seq: 
> Seq) {
>            self.init()
>            
>            reserveCapacity(seq.count)
>            for elem in seq {
>                
> _appendElementQuicklyBecauseWeAlreadyKnowWeHaveTheCapacity(elem)
>            }
>        }
>    }
> 
> But `seq.count` might consume the sequence, leaving you unable to add any 
> elements. What do we do?
> 
> Enter `underestimateCount()`. We can always call `underestimateCount()` and 
> size to whatever it says we should be. For sequences with easily-calculated 
> counts, this should give us a size that's just right. For sequences where 
> they can kind of estimate the right count (for instance, if you're decoding a 
> fixed-size buffer of UTF-8 bytes, and you know how many bytes there are but 
> not exactly how many characters they'll decode to), it will get us at least 
> part of the way there. For sequences whose size is completely unknown, it 
> won't help but it won't hurt much either.
> 
> So you do this:
> 
>    // Notionally
>    struct Array<Element> {
>        init<Seq: SequenceType where Seq.Generator.Element == Element>(_ seq: 
> Seq) {
>            self.init()
>            
>            let fastCount = seq.underestimateCount()
>            reserveCapacity(fastCount)
>            
>            // We'll have to pull elements out manually.
>            let generator = seq.generate()
>            
>            // Load all the fast elements
>            for _ in 0..<fastCount {
>                let elem = generator.next()!
>                
> _appendElementQuicklyBecauseWeAlreadyKnowWeHaveTheCapacity(elem)
>            }
>            
>            // Get anything else that's left
>            while let elem = generator.next() {
>                appendElement(elem)
>            }
>        }
>    }
> 
> (If you're looking at Swift 3, remember: SequenceType is now Sequence, 
> Generator is now Iterator, generate() is now makeIterator(), and 
> underestimateCount() is now underestimatedCount().)
> 
> And in fact, this is quite similar to some real source code inside Array:
> 
>    @warn_unused_result
>    internal func _copySequenceToNativeArrayBuffer<
>      S : SequenceType
>    >(source: S) -> _ContiguousArrayBuffer<S.Generator.Element> {
>      let initialCapacity = source.underestimateCount()
>      var builder =
>        _UnsafePartiallyInitializedContiguousArrayBuffer<S.Generator.Element>(
>          initialCapacity: initialCapacity)
>    
>      var generator = source.generate()
>    
>      // FIXME(performance): use _initializeTo().
>    
>      // Add elements up to the initial capacity without checking for regrowth.
>      for _ in 0..<initialCapacity {
>        builder.addWithExistingCapacity(generator.next()!)
>      }
>    
>      // Add remaining elements, if any.
>      while let element = generator.next() {
>        builder.add(element)
>      }
>    
>      return builder.finish()
>    }
> 
> Now, collections *do* promise that you can walk through a sequence more than 
> once, and some collections (ones that promise "random access") promise that 
> `count` can be calculated quickly. But because CollectionType conforms to 
> SequenceType, some code may receive a collection but only know it's a 
> sequence. Fortunately, the standard library provides a default implementation 
> of `underestimateCount` for collections:
> 
>      public func underestimateCount() -> Int {
>        return numericCast(count)
>      }
> 
> So you just don't have to worry about this on collections. Implement `count` 
> and `understimateCount` will take care of itself.
> 

Okay, so… maybe I am unfairly treating your summary as the whole story… 
returning to the example of the contents of /dev/random, or quadrature data 
from a wireless receiver, which are supposed to be neither finite nor 
reproducible, but are indisputably one-thing-after-another — sequences in the 
common sense of the word …

Am I to gather

1: that the standard API requires that 
every Sequence type can be instantiated so as to replicate the contents of 
another? The whole value of Sequence to the examples is that it should be 
impossible. 

2: that the standard API requires of every Sequence that it should serve the 
expectation of the supposedly-uncoupled API of Array that it is free to buffer 
the entire contents of any Sequence of the same Element? Need it or not? Bad 
news for memory and performance.

The answer may be that if the user knows enough to build a Sequence for which 
laziness is indispensable, she should have better sense than to rely on those 
two things, required or not.

But a lifetime of Law and debugging have given me an exceptionally dirty mind. 
If those really are API, I can avoid them, but I can't expect others — 
including the standard library — to.

    — F

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

Reply via email to