Re: [swift-evolution] [Review] SE-0065 A New Model for Collections and Indices

Brent Royal-Gordon via swift-evolution Mon, 11 Apr 2016 21:57:51 -0700

>> If these types are for implementation sharing, should they be
>> underscored to discourage their use? Or is the position they occupy in
>> the type hierarchy important because Range and ClosedRange will
>> eventually occupy them?
> 
> Underscoring hides names from users completely, and IMO that would not
> be appropriate here.  If we underscored them, it would be mysterious
> where an implementation of various methods came from.


If the concrete types show these members, does it matter that they actually 
came from a protocol?

>> On the other hand, it's not like the SubSequence subscript now takes a
>> RangeProtocol. Should it?
> 
> No; subscript can't be generic (language limitation).

Right, and RangeProtocol isn't existential. Ouch.

>>> func successor(of i: Index) -> Index
>> 
>> Two things:
>> 
>> 1. I would really like a version of this which returns Optional and is
>> guaranteed to go `nil` once it hits `endIndex`. 
> 
> The primary question to answer when exploring this idea is, IMO, “what
> does that do to the code in algorithms?”  I can't imagine writing binary
> search, partition, or rotate if I had to check for nil every time I
> moved an index.

If you're confident you're remaining in bounds, you should force-unwrap it.

>> There can be a non-optional version too, or it might even be a feature
>> of the `index` family of methods instead of `successor` itself, but I
>> think it would be valuable to let the collection worry about the
>> bounds check.
> 
> Why would that be valuable?
> 
>> It seems silly to check the index before calling `successor(of:)` when
>> `successor(of:)` is going to immediately perform the same check again
>> as a precondition.
> 
> Preconditions are not necessarily checked.  We don't promise to trap
> every precondition violation.  We promise memory and type safety in the
> absence of data races, and some precondition failures will trap in order
> to ensure that.  Others will trap, where we think it is affordable, just
> to provide a better programmer experience.

I understand that, but many—perhaps most—clients of APIs like `successor(of:)` 
will need to perform a bounds check. I think we would be better off if the 
check were implicit in the call. That would force all clients, or at least all 
clients which used the bounds-checking variants (which would be encouraged), to 
explicitly handle out-of-bounds conditions, in much the same way that 
`index(of:)` forces its clients to explicitly handle the possibility that a 
matching element might not exist, rather than returning `endIndex` (which would 
be easier).

>> (Actually, in general, I'm a little bit dismayed that the collection
>> API does so little to help you include bounds checks in your
>> code. Especially when iterating through a collection, bounds checks
>> are absolutely mandatory, and the collection API's solution to the
>> problem is "eh, just use `<`, it's not like you might mess something
>> like that up".)
> 
> What do you mean?  Could you show an example of what you think should be
> better supported?

The simplest thing would simply be to have calls like these in Collection:

        func validate(i: Index) -> Bool
        func validateIncreasing(i: Index) -> Bool
        func validateDecreasing(i: Index) -> Bool

Usage:

        while collection.validateIncreasing(i) {
                collection[i] = repeatedValue
                i = collection.successor(of: i)
        }

These methods would be one-liners, sure, but they would make your code say what 
was *meant*, not what happened to be true.

However, we could go further than this and perhaps reduce the amount of 
bounds-checking we do, even across optimization barriers.

To explain what I mean, let's take a look at `IndexingIterator`. With various 
less-interesting bits stripped out, the pull request defines it like this:

        public struct IndexingIterator<Elements: IndexableBase>: 
IteratorProtocol, Sequence {
          init(_elements: Elements) {
            self._elements = _elements
            self._position = _elements.startIndex
          }

          mutating func next() -> Elements._Element? {
            if _position == _elements.endIndex { return nil }
            let element = _elements[_position]
            _elements.formSuccessor(&_position)
            return element
          }

          internal let _elements: Elements
          internal var _position: Elements.Index
        }

Let's annotate the `next()` method with its bounds-checking behavior.

          mutating func next() -> Elements._Element? {
            if _position == _elements.endIndex { return nil }           // 
Explicit bounds check performed here
            let element = _elements[_position]                          // 
Usually precondition()s a bounds check to preserve memory safety
            _elements.formSuccessor(&_position)                 // May perform 
a bounds check on the incremented value
            return element
          }

Depending on the implementation, at least two, and possibly all three, of these 
lines will involve a bounds check. Now, perhaps the optimizer is good at 
eliding redundant checks involving an IndexingIterator because stdlib is 
transparent, but similar user code wouldn't get that benefit.

So, imagine that we have a type like this in the standard library:

        /// Represents a pre-validated index. A pre-validated index received 
from a given collection is 
        /// guaranteed to refer to a valid element in that collection, as long 
as the collection is not mutated.
        /// 
        /// -Warning:   Operations which accept a Valid<Index> assume it is in 
bounds and do not perform 
        ///                     bounds checks. Using a Valid<Index> on a 
collection other than the one which created 
        ///                     it, or on a collection which has been mutated 
since it was created, may be unsafe.
        struct Valid<Index: Comparable> {
                init(unsafeIndex index: Index) { self.index = index }
                private(set) var index: Index
        }

And a few members like these in the Collection protocols (there would be 
others; I'm just defining the ones we're using here:

        /// Validates an index, returning `nil` if it is invalid, or a 
Valid<Index> for that index otherwise.
        func validate(i: Index) -> Valid<Index>?
        
        /// Transforms `i` to represent the next valid index after itself, or 
`nil` if that index would be out of bounds.
        func formValidSuccessor(i: inout Valid<Index>?)
        
        /// Accesses the element corresponding to the prevalidated index
        subscript (i: Valid<Index>) -> Element { get }

With those in hand, you can rewrite IndexingIterator like so:

        public struct IndexingIterator<Elements: IndexableBase>: 
IteratorProtocol, Sequence {
          init(_elements: Elements) {
            self._elements = _elements
            self._position = _elements.validate(_elements.startIndex)
          }

          mutating func next() -> Elements._Element? {
            if _position == nil { return nil }                          // This 
merely checks the result of the previous bounds check
            let element = _elements[_position!]                 // No bounds 
check needed
            _elements.formValidSuccessor(&_position!)   // This is the only 
bounds check in the loop
            return element
          }

          internal let _elements: Elements
          internal var _position: Valid<Elements.Index>?
        }

This version only tests the bounds once per element, inside 
`formValidSuccessor(_:)` (or in `validate` for the start index), rather than 
several times.

(Incidentally, `Valid<Int>?` could theoretically be expressed in the size of a 
normal `Int`, since `endIndex <= Int.max`, and `endIndex` is not valid. I don't 
see a good way to tell Swift about this, though.)

>>      collection.index(5, from: i)
> 
> I don't think this one reads clearly enough.

"The index 5 from i" reads fine to me, but I guess that's a matter of opinion.

>>      collection.traveling(5, from: i)
>>      collection.striding(5, from: i)
>>      collection.advancing(i, by: 5)
> 
> None of the “ing” names work, IMO because that suffix suggests you're
> returning a modified version of the receiver.

Huh, that clarifies something. How about the non-`ing` variants?

        collection.travel(5, from: i)
        collection.stride(5, from: i)
        collection.advance(i, by: 5)

I'd say `stride` might be an attractive nuisance in this form (although if 
Strideable's public face becomes `Self.striding(by:)`, only to Swift 2 users) 
but the others look fine to me.

>>> func index(n: IndexDistance, stepsFrom i: Index, limitedBy limit: Index) -> 
>>> Index
>> 
>> 2. This method can move the index either forwards or backwards, but
>> only one limit is specified. Would we be better off having the `limit`
>> be a range?
> 
> That would add a cost for checking that one doesn't want to pay in
> algorithms that need this method.

I'm a little uncomfortable with the way `limit`'s interpretation changes based 
on the sign of `n` here; a stray arithmetic error could turn a maximum into a 
minimum, with potentially bizarre results. In the code, it looks like we end up 
branching to accommodate that interpretation, too, which isn't ideal.

I might feel a little better with a variant for each limit semantic:

        func index(n: IndexDistance, stepsFrom: Index, belowLimit: Index) -> 
Index                              // for when you expect to move up
        func index(n: IndexDistance, stepsFrom: Index, aboveLimit: Index) -> 
Index                      // for when you expect to move down
        func index(n: IndexDistance, stepsFrom: Index, withinLimits: 
Range<Index>) -> Index     // for when you don't know which direction you're 
going

>> 3. What is the use case for returning the `limit` instead of returning
>> the fact that we crossed it? I have a hard time thinking of a case
>> where I would want to just bump up against the limit and use it rather
>> than *detect* that I've hit the limit (which would probably call for a
>> return type of `Index?`). Are there common needs that I'm just not
>> thinking of? 
> 
> Sure, for example
> 
>  x[i..<x.index(n, stepsFrom: i, limitedBy: x.endIndex)].sort()

Let me restate that. Given that Swift doesn't usually seem to truncate ranges 
that are too long (though there are exceptions), are we more frequently going 
to want to stop at the limit, or to detect that the limit has been hit?

>> Should we offer both?
> 
> Definitely not, IMO!  They are utterly redundant, are they not?

Well, an Optional-returning `index(_:stepsFrom:limitedBy:)` can be treated as a 
truncating version pretty easily. For instance, `prefix(_:)` can do this:

        let end = index(numericCast(maxLength), stepsFrom: startIndex, 
limitedBy: endIndex) ?? endIndex
        return self[startIndex..<end]

This is a bit of a hassle, and might be less efficient (an optional return, an 
extra test), but I don't think you can reliably go in the other direction.

(On the other hand, it might be that I'm conceiving of the purpose of 
`limitedBy` differently from you—I think of it as a safety measure, but you may 
be thinking of it specifically as an automatic truncation mechanism.)

-- 
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Review] SE-0065 A New Model for Collections and Indices

Reply via email to