Re: [swift-evolution] Proposal: CollectionType.cycle property for an infinite sequence

plx via swift-evolution Wed, 30 Dec 2015 11:02:34 -0800

> On Dec 29, 2015, at 6:38 PM, Dave Abrahams <dabrah...@apple.com> wrote:
> 
> 
>> On Dec 29, 2015, at 3:39 PM, plx via swift-evolution 
>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>> 
>>> 
>>> On Dec 29, 2015, at 1:17 PM, Kevin Ballard via swift-evolution 
>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>>> 
>>> On Tue, Dec 29, 2015, at 08:14 AM, plx via swift-evolution wrote:
>>>> Personally I’d say this should be a -1 for standard-library inclusion.
>>>> 
>>>> Swift’s not really built to handle infinite sequences right now; until 
>>>> they are handed better by the standard library convenience methods for 
>>>> creating them shouldn’t be in the standard library.
>>> 
>>> As far as I can tell, the only way in which it's "not really built" to 
>>> handle this is that there are multiple constructs that attempt to eagerly 
>>> consume an entire sequence, and using these with an infinite sequence would 
>>> end up looping forever. But I don't consider that to be a particularly 
>>> serious problem. You could "fix" it by refactoring SequenceType into two 
>>> protocols, SequenceType (for possibly-infinite sequences) and 
>>> FiniteSequenceType (for known-finite sequences) and then going over the 
>>> entire standard library and updating various spots to use 
>>> FiniteSequenceType, except this would be very limiting (sequences that are 
>>> not known if they're infinite to the compiler could still be valid for 
>>> various eager algorithms if the programmer knows it will be finite in 
>>> practice).
>> 
>> Indeed, on my wishlist I would like to see the standard protocols refactored 
>> to something more like this:
>> 
>> SequenceType // can be iterated
>> FiniteSequenceType : SequenceType, // of finite length
>> StableSequenceType : SequenceType, // can be re-iterated identically
>> CollectionType : StableSequenceType, FiniteSequenceType, (etc.) // otherwise 
>> as it is now
> 
> This is interesting. A few concerns:
> 
> First, we have tried not to create any distinct protocols with identical 
> syntactic requirements, because we think it makes the world clearer; we think 
> people are more likely to assign incorrect protocols when all the operations 
> they want are available, but don’t have the right semantics.  That isn’t to 
> say we shouldn’t start doing it, but it would be a break from the past.
> 
> Higher protocol granularity has a high comprehensibility cost.  
> Distinguishing protocols based on semantic requirements alone may make the 
> library harder to understand.  I’ve heard some people’s heads have exploded 
> from simply encountering CollectionType. 
> 
> Next, it’s a principle of generic programming that every protocol should be 
> justified by both algorithms that exploit its requirements (e.g. extension 
> methods) and some real-world models that can’t reasonably also conform to a 
> more-refined protocol.  For example, we have a ForwardIndexType because a 
> singly-linked list has multipass forward traversal and can’t do bidirectional 
> traversal.  In order to evaluate any proposal for new protocols, we’d need to 
> see all of these things.


Speaking frankly, I’d say that there are few benefits to a refactoring along 
the lines sketched above until/unless you want to have solid support for things 
like a product-sequence or product-collection; you would gain *significant* 
advantages from such a refactored hierarchy in such scenarios, but I can’t 
think of any meaningful improvement the refactoring would offer in the 
essentially-linear case. (Note that this is distinct from examples of concrete 
models that are naturally e.g. finite-but-not-stable and vice-versa; they 
exist, but the rest is already long enough as it is).

A full example is beyond my time-budget here, but I can give the flavor of 
where it makes a difference somewhat quickly.

Consider a hypothetical `ProductSequence2<A:SequenceType,B:SequenceType>` that 
enumerates the cartesian product of two sequences (in an unspecified order); 
the naive implementation has problems with non-stable sequences and with 
infinite-sequences, but I’ll only discuss the infinite case b/c it’s more 
relevant here.

When either—or both—of the sequences are infinite, the order-of-iteration has 
actual observable consequences; as a concrete example, if you want this to work 
out:

        ProductSequence2([1,2].cycle,[1,2].cycle).contains((2,2)) == true

…then we have this:

- if `a` is finite and `b` is not, you want to iterate like (a0,b0), (a1,b0), 
(a2, b0) … , (a0, b1), (a1, b1), …, etc
- if `b` is finite and `a` is not, you want to iterate like (a0,b0), (a0,b1), 
(a0, b2) … , (a1, b0), (a1, b1), …, etc 
- if neither `a` nor `b` is finite, you want to iterate like (a0, b0), (a1,b0), 
(a0,b1), (a2, b0), (a1, b1), (a0, b2), … etc

…as with those orderings you *will* eventually reach each pair in the product 
(which seems strictly superior to an iteration order which will leave many 
members of the product forever un-visited).

Importantly, the third ordering is inefficient in the general case, and thus 
isn’t a suitable default; you’d only want it in places where you’re 
*intentionally* using infinite series.

This is a concrete example of the sort of thing that leaves me preferring that 
the standard library *not* contain infinite sequences at this time: such 
sequences *highly* benefit from special handling, but there’s no good way at 
this time to generically provide such handling in a general context within the 
bounds of the current language and standard library .

If there’s a realistic chance of such a refactoring being accepted if 
sufficiently-well-proposed I can prepare something, but I’m admittedly 
skeptical given the likely magnitude of the associated changes. In particular, 
the API on types like `ForwardIndexType` is rather unfortunate for things like 
a hypothetical product-collection scenario; since such products are among the 
stronger motivating uses, this means it’s a protocol refactoring + a lot of API 
changes to the existing standard library, and probably making things clunkier 
in the common / linear cases.

> 
>> …but can understand the wish to not overly-complicate the basic protocol 
>> hierarchy (and also to avoid baking-in nice-to-have, but 
>> impossible-to-really-enforce semantic requirements; I’d trust the standard 
>> library to use them properly, but not typical 3rd party code, somewhat 
>> defeating the purpose).
> 
> Well, I guess I should have read ahead and most of my lecture above was 
> needless!  I’m posting it anyway because I think it spells out some important 
> principles we’ll need to refer to later.
> 
>> Everything else is a difference of outlook; we agree on the facts and differ 
>> in interpretation.
>> 
>> I consider concrete types that adopt a protocol only to simply call 
>> `fatalError` for most of the protocol methods to be pathological — often 
>> useful, but still pathological — and thus far the standard library doesn’t 
>> contain any such pathological types (to my knowledge).
>> 
>> `cycle` is useful but not useful enough to be the standard library’s first 
>> “pathological” type, so it’s still a -1 as proposed.
>> 
>> This is nothing specific to `cycle` and my opinion here could change were 
>> the language or the standard library to evolve in various ways.
>>> 
>>>> You’d also want to call `fatalError` for at least `reduce`, `reverse`, 
>>>> `sort`, `split`(?), `flatMap`, `dropLast`, `suffix`, and `forEach`.
>>> 
>>> You can only do it for the ones defined in the protocol, not ones defined 
>>> in extensions. This means map, filter, forEach, and suffix.
>>> 
>>> split is actually perfectly implementable for a CycleSequence, although it 
>>> does need a custom implementation. split is bounded by at most Int.max 
>>> splits, which means it is guaranteed to terminate even for an infinite 
>>> sequence (although the final subsequence does need to be infinite[1]). Even 
>>> if there are no separators in the cycle, it can just return the cycle 
>>> itself.
>>> 
>>> [1] Interestingly, the current implementation actually dumps the remainder 
>>> into an Array and returns that (wrapped in AnySequence), which is curious 
>>> because it would be perfectly legal for it to just wrap the generator up in 
>>> AnySequence and return that instead. I'm tempted to submit a PR to change 
>>> that now, as it just seems like unnecessary work to use an array.
>>> 
>>>> `startsWith` and `elementsEqual` and `lexicographicComparison` are all 
>>>> broken if you call them like e.g. `self.startsWith(self)`.
>>> 
>>> That's true, but what do you really expect when you're calling them with 
>>> two infinite sequences? It's not so much that they're broken as it is that 
>>> you're creating an infinite loop without any way to break out. And FWIW, 
>>> lexicographicalCompare is actually something you might want to explicitly 
>>> support on infinite sequences if you know your sequences aren't equal and 
>>> want to find out which one is less than the other.
>>> 
>>> There are plenty of ways to shoot yourself in the foot. I don't think 
>>> infinite sequences are really the big problem you're making them out to be.
>>> 
>>>> You can conceivably implement a non-crashing `contains`, `minElement` and 
>>>> `maxElement` on `CycleSequence` by calling through to the wrapped 
>>>> collection, but that’ll seemingly evaporate as soon as your 
>>>> `CycleSequence` winds up hidden inside an `AnySequence`.
>>> 
>>> You can't override those anyway in a generic context, because they're not 
>>> members of the protocol, they're just extensions. You could implement them 
>>> such that your implementation is called when working on the concrete 
>>> CycleSequence type, but I'm not sure if that's a great idea to do that when 
>>> the actual behavior differs from calling it generically on SequenceType 
>>> (well, triggering a fatalError() instead of an infinite loop is fine 
>>> because they're both Bottom, but returning a valid result in one context 
>>> and looping infinitely in the other seems bad).
>>> 
>>> Of course, these methods could actually be moved into the protocol itself, 
>>> which would let us override them. I'm not entirely sure why they aren't in 
>>> the protocol to begin with, I guess because there's not much need for 
>>> overriding these outside of infinite sequences (well, finite sorted 
>>> sequences could provide an optimized min/maxElement, and an optimized 
>>> version of contains(_: Self.Generator.Element), but maybe there's tradeoffs 
>>> to doing this, e.g. maybe there's some reason why having a large protocol 
>>> witness table is a bad idea).
>> 
>> I don’t think `contains` or `minElement/maxElement` *can* be part of the 
>> protocol (in the sense of overridable) at this time (they require `Element` 
>> satisfy certain type constraints), but they certainly should be if the type 
>> system someday would support that.
>> 
>>> 
>>>> Which illustrates why this is a -1 for me; there's nothing wrong with the 
>>>> functionality in isolation and there’s nothing wrong with infinite 
>>>> sequences, but the standard library should play well with itself, and this 
>>>> wouldn’t play well with the rest of the standard library.
>>> 
>>> Ultimately, there's not much difference between an infinite sequence and a 
>>> sequence of Int.max elements. The latter is finite, but it's so massive 
>>> (especially on 64-bit) that any kind of eager processing is going to hit 
>>> the same problems as an infinite sequence. Every problem you describe will 
>>> be a problem with the simple sequence `(0..<Int.max)` as well.
>>> 
>>> -Kevin Ballard
>>> 
>>>> That opinion could change as the language changes or the standard library 
>>>> evolves.
>>>> 
>>>>> On Dec 28, 2015, at 1:20 AM, Kevin Ballard via swift-evolution 
>>>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>>>>> 
>>>>> ## Introduction
>>>>> 
>>>>> Add a new property `cycle` to CollectionType that returns an infinite 
>>>>> SequenceType that yields the elements of the collection in a loop.
>>>>> 
>>>>> ## Motivation
>>>>> 
>>>>> It's sometimes useful to be able to have an infinite sequence. For 
>>>>> example, `CollectionOfOne(x).cycle` could be used to have an infinite 
>>>>> sequence of a single element (similar to Repeat but without a count). A 
>>>>> common use for infinite sequences is zipping with a finite sequence. As 
>>>>> far as I'm aware, the stdlib does not currently provide any way to create 
>>>>> such an infinite sequence.
>>>>> 
>>>>> ## Proposed solution
>>>>> 
>>>>> Extend CollectionType with a new property `cycle` that yields a type that 
>>>>> conforms to SequenceType. This sequence yields each element of the 
>>>>> collection in an infinite loop.
>>>>> 
>>>>> ## Detailed design
>>>>> 
>>>>> 2 new types would be added:
>>>>> 
>>>>> struct CycleSequence<Base : CollectionType> : LazySequenceType { ... }
>>>>> struct CycleGenerator<Base : CollectionType> : GeneratorType { ... }
>>>>> 
>>>>> CollectionType would be extended with a property:
>>>>> 
>>>>> extension CollectionType {
>>>>>   public var cycle: CycleSequence<Self> { get }
>>>>> }
>>>>> 
>>>>> This is an extension of CollectionType instead of SequenceType because it 
>>>>> requires a multi-pass sequence (and SequenceType does not provide that 
>>>>> guarantee). The returned type conforms to SequenceType instead of 
>>>>> CollectionType because there is no possible `endIndex` that satisfies the 
>>>>> requirement of being reachable from `startIndex` by zero or more 
>>>>> applications of `successor()`.
>>>>> 
>>>>> Because the default eager versions of map and filter will execute forever 
>>>>> on an infinite sequence, CycleSequence conforms to LazySequenceType 
>>>>> instead of SequenceType in order to provide lazy versions of those 
>>>>> functions. Additionally, it will provide implementations of the eager 
>>>>> versions that simply trigger a fatalError(), as the alternative is an 
>>>>> infinite loop that consumes more and more memory.
>>>>> 
>>>>> ## Impact on existing code
>>>>> 
>>>>> None
>>>>> 
>>>>> -Kevin Ballard
>>>>> _______________________________________________
>>>>> swift-evolution mailing list
>>>>> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>>> 
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>  _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>> <https://lists.swift.org/mailman/listinfo/swift-evolution>

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] Proposal: CollectionType.cycle property for an infinite sequence

Reply via email to