Re: [swift-evolution] Proposal: CollectionType.cycle property for an infinite sequence

plx via swift-evolution Wed, 30 Dec 2015 14:28:51 -0800

> On Dec 30, 2015, at 3:39 PM, Kevin Ballard via swift-evolution 
> <[email protected]> wrote:
> 
> Refactoring like that can always be done later; introducing infinite 
> sequences now shouldn't make it any harder to refactor the protocols. If 
> anything, it will provide more practical experience with how infinite 
> sequences interact with the current protocol hierarchy that would help guide 
> the design of any refactoring (or, perhaps more importantly, help determine 
> if such a refactoring is worthwhile).
>  
> A big concern I have with refactoring like that is you'd rarely ever be able 
> to actually bound an algorithm on FiniteSequenceType, because there will 
> always be ways of constructing sequences that the compiler can't tell if it's 
> infinite. Trivial example is `ary.cycle.takeWhile(pred)`. This is infinite if 
> `pred(elt)` is true for every element of `ary` and finite otherwise. But 
> there's tons of other ways of creating such indeterminate sequences. Bounding 
> algorithms (such as `map(..) -> [T]` or `reduce()`) on finite sequences only 
> would be rather limiting, as users who know their sequence must be finite but 
> can't prove it to the compiler would be unable to use methods that are in 
> fact perfectly safe.


If you will accept that I would prefer the standard library types to interpret 
the standard library protocols as having already made such a binding — as, 
indeed, the type signatures necessarily imply — the rest of my objection should 
be comprehensible.

I don’t find the repeated points about what can happen due to user error to be 
interesting.

> 
> What you could do with such a refactoring is optimize certain algorithms if 
> they use a more precise sequence type, but I'm not actually sure what sort of 
> optimizations you can make with your proposed protocol hierarchy. In fact, 
> the only real optimizations that come to mind are optimizing when you know 
> you're working with a CycleSequence, because for various algorithms you 
> really only have to iterate the underlying elements once (e.g. assuming a 
> pure predicate, CycleSequence can implement contains() safely, although we 
> don't actually have pure in the language right now so that's not necessarily 
> a safe assumption; also, contains() isn't a protocol method, just an 
> extension method, so we can't actually do that anyway).

I already gave the example of a product sequence. There’s a simple 
implementation that is correct when both input sequences are finite-and-stable, 
but not necessarily if otherwise; the proposed refactoring would statically 
provides enough information to allow one to choose the minimally-*pessimal* 
implementation that is still correct for such inputs. 

How much correctness matters is, of course, essentially subjective.

>  
> On that note, core language team, anyone know offhand why SequenceType has a 
> bunch of extension methods that aren't part of the protocol? For example, 
> contains(_ predicate:). The only real reason I can think of is to shrink the 
> protocol witness table, but surely that's not particularly meaningful. I 
> warrant that contains(_ predicate:) doesn't really have any reason to be 
> overridden by anything except sequences that knowingly repeat elements (e.g. 
> CycleSequence and Repeat), and even that's only if you assume the predicate 
> is pure, but there's some other methods that make sense to override on some 
> sequences (like minElement(_ isOrderedBefore:) for any sequence that has a 
> defined ordering).

I am not sure how, precisely, you would propose to override the closure-taking 
variant of `minElement` to take advantage of an intrinsic ordering, but I’d be 
curious to see it.

I’d hope that the non-closure-accepting variants would be made overridable once 
the type system supports it.

>  
> -Kevin Ballard
>  
> On Wed, Dec 30, 2015, at 11:01 AM, plx via swift-evolution wrote:
>>  
>>> On Dec 29, 2015, at 6:38 PM, Dave Abrahams <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>  
>>>  
>>>> On Dec 29, 2015, at 3:39 PM, plx via swift-evolution 
>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>  
>>>>>  
>>>>> On Dec 29, 2015, at 1:17 PM, Kevin Ballard via swift-evolution 
>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>  
>>>>> On Tue, Dec 29, 2015, at 08:14 AM, plx via swift-evolution wrote:
>>>>>> Personally I’d say this should be a -1 for standard-library inclusion.
>>>>>>  
>>>>>> Swift’s not really built to handle infinite sequences right now; until 
>>>>>> they are handed better by the standard library convenience methods for 
>>>>>> creating them shouldn’t be in the standard library.
>>>>>  
>>>>> As far as I can tell, the only way in which it's "not really built" to 
>>>>> handle this is that there are multiple constructs that attempt to eagerly 
>>>>> consume an entire sequence, and using these with an infinite sequence 
>>>>> would end up looping forever. But I don't consider that to be a 
>>>>> particularly serious problem. You could "fix" it by refactoring 
>>>>> SequenceType into two protocols, SequenceType (for possibly-infinite 
>>>>> sequences) and FiniteSequenceType (for known-finite sequences) and then 
>>>>> going over the entire standard library and updating various spots to use 
>>>>> FiniteSequenceType, except this would be very limiting (sequences that 
>>>>> are not known if they're infinite to the compiler could still be valid 
>>>>> for various eager algorithms if the programmer knows it will be finite in 
>>>>> practice).
>>>>  
>>>> Indeed, on my wishlist I would like to see the standard protocols 
>>>> refactored to something more like this:
>>>>  
>>>> SequenceType // can be iterated
>>>> FiniteSequenceType : SequenceType, // of finite length
>>>> StableSequenceType : SequenceType, // can be re-iterated identically
>>>> CollectionType : StableSequenceType, FiniteSequenceType, (etc.) // 
>>>> otherwise as it is now
>>>  
>>> This is interesting. A few concerns:
>>>  
>>> First, we have tried not to create any distinct protocols with identical 
>>> syntactic requirements, because we think it makes the world clearer; we 
>>> think people are more likely to assign incorrect protocols when all the 
>>> operations they want are available, but don’t have the right semantics.  
>>> That isn’t to say we shouldn’t start doing it, but it would be a break from 
>>> the past.
>>>  
>>> Higher protocol granularity has a high comprehensibility cost.  
>>> Distinguishing protocols based on semantic requirements alone may make the 
>>> library harder to understand.  I’ve heard some people’s heads have exploded 
>>> from simply encountering CollectionType. 
>>>  
>>> Next, it’s a principle of generic programming that every protocol should be 
>>> justified by both algorithms that exploit its requirements (e.g. extension 
>>> methods) and some real-world models that can’t reasonably also conform to a 
>>> more-refined protocol.  For example, we have a ForwardIndexType because a 
>>> singly-linked list has multipass forward traversal and can’t do 
>>> bidirectional traversal.  In order to evaluate any proposal for new 
>>> protocols, we’d need to see all of these things.
>>  
>> Speaking frankly, I’d say that there are few benefits to a refactoring along 
>> the lines sketched above until/unless you want to have solid support for 
>> things like a product-sequence or product-collection; you would gain 
>> *significant* advantages from such a refactored hierarchy in such scenarios, 
>> but I can’t think of any meaningful improvement the refactoring would offer 
>> in the essentially-linear case. (Note that this is distinct from examples of 
>> concrete models that are naturally e.g. finite-but-not-stable and 
>> vice-versa; they exist, but the rest is already long enough as it is).
>>  
>> A full example is beyond my time-budget here, but I can give the flavor of 
>> where it makes a difference somewhat quickly.
>>  
>> Consider a hypothetical `ProductSequence2<A:SequenceType,B:SequenceType>` 
>> that enumerates the cartesian product of two sequences (in an unspecified 
>> order); the naive implementation has problems with non-stable sequences and 
>> with infinite-sequences, but I’ll only discuss the infinite case b/c it’s 
>> more relevant here.
>>  
>> When either—or both—of the sequences are infinite, the order-of-iteration 
>> has actual observable consequences; as a concrete example, if you want this 
>> to work out:
>>  
>> ProductSequence2([1,2].cycle,[1,2].cycle).contains((2,2)) == true
>>  
>> …then we have this:
>>  
>> - if `a` is finite and `b` is not, you want to iterate like (a0,b0), 
>> (a1,b0), (a2, b0) … , (a0, b1), (a1, b1), …, etc
>> - if `b` is finite and `a` is not, you want to iterate like (a0,b0), 
>> (a0,b1), (a0, b2) … , (a1, b0), (a1, b1), …, etc 
>> - if neither `a` nor `b` is finite, you want to iterate like (a0, b0), 
>> (a1,b0), (a0,b1), (a2, b0), (a1, b1), (a0, b2), … etc
>>  
>> …as with those orderings you *will* eventually reach each pair in the 
>> product (which seems strictly superior to an iteration order which will 
>> leave many members of the product forever un-visited).
>>  
>> Importantly, the third ordering is inefficient in the general case, and thus 
>> isn’t a suitable default; you’d only want it in places where you’re 
>> *intentionally* using infinite series.
>>  
>> This is a concrete example of the sort of thing that leaves me preferring 
>> that the standard library *not* contain infinite sequences at this time: 
>> such sequences *highly* benefit from special handling, but there’s no good 
>> way at this time to generically provide such handling in a general context 
>> within the bounds of the current language and standard library .
>>  
>> If there’s a realistic chance of such a refactoring being accepted if 
>> sufficiently-well-proposed I can prepare something, but I’m admittedly 
>> skeptical given the likely magnitude of the associated changes. In 
>> particular, the API on types like `ForwardIndexType` is rather unfortunate 
>> for things like a hypothetical product-collection scenario; since such 
>> products are among the stronger motivating uses, this means it’s a protocol 
>> refactoring + a lot of API changes to the existing standard library, and 
>> probably making things clunkier in the common / linear cases.
>>  
>>>  
>>>> …but can understand the wish to not overly-complicate the basic protocol 
>>>> hierarchy (and also to avoid baking-in nice-to-have, but 
>>>> impossible-to-really-enforce semantic requirements; I’d trust the standard 
>>>> library to use them properly, but not typical 3rd party code, somewhat 
>>>> defeating the purpose).
>>>  
>>> Well, I guess I should have read ahead and most of my lecture above was 
>>> needless!  I’m posting it anyway because I think it spells out some 
>>> important principles we’ll need to refer to later.
>>>  
>>>> Everything else is a difference of outlook; we agree on the facts and 
>>>> differ in interpretation.
>>>>  
>>>> I consider concrete types that adopt a protocol only to simply call 
>>>> `fatalError` for most of the protocol methods to be pathological — often 
>>>> useful, but still pathological — and thus far the standard library doesn’t 
>>>> contain any such pathological types (to my knowledge).
>>>>  
>>>> `cycle` is useful but not useful enough to be the standard library’s first 
>>>> “pathological” type, so it’s still a -1 as proposed.
>>>>  
>>>> This is nothing specific to `cycle` and my opinion here could change were 
>>>> the language or the standard library to evolve in various ways.
>>>>>  
>>>>>> You’d also want to call `fatalError` for at least `reduce`, `reverse`, 
>>>>>> `sort`, `split`(?), `flatMap`, `dropLast`, `suffix`, and `forEach`.
>>>>>  
>>>>> You can only do it for the ones defined in the protocol, not ones defined 
>>>>> in extensions. This means map, filter, forEach, and suffix.
>>>>>  
>>>>> split is actually perfectly implementable for a CycleSequence, although 
>>>>> it does need a custom implementation. split is bounded by at most Int.max 
>>>>> splits, which means it is guaranteed to terminate even for an infinite 
>>>>> sequence (although the final subsequence does need to be infinite[1]). 
>>>>> Even if there are no separators in the cycle, it can just return the 
>>>>> cycle itself.
>>>>>  
>>>>> [1] Interestingly, the current implementation actually dumps the 
>>>>> remainder into an Array and returns that (wrapped in AnySequence), which 
>>>>> is curious because it would be perfectly legal for it to just wrap the 
>>>>> generator up in AnySequence and return that instead. I'm tempted to 
>>>>> submit a PR to change that now, as it just seems like unnecessary work to 
>>>>> use an array.
>>>>>  
>>>>>> `startsWith` and `elementsEqual` and `lexicographicComparison` are all 
>>>>>> broken if you call them like e.g. `self.startsWith(self)`.
>>>>>  
>>>>> That's true, but what do you really expect when you're calling them with 
>>>>> two infinite sequences? It's not so much that they're broken as it is 
>>>>> that you're creating an infinite loop without any way to break out. And 
>>>>> FWIW, lexicographicalCompare is actually something you might want to 
>>>>> explicitly support on infinite sequences if you know your sequences 
>>>>> aren't equal and want to find out which one is less than the other.
>>>>>  
>>>>> There are plenty of ways to shoot yourself in the foot. I don't think 
>>>>> infinite sequences are really the big problem you're making them out to 
>>>>> be.
>>>>>  
>>>>>> You can conceivably implement a non-crashing `contains`, `minElement` 
>>>>>> and `maxElement` on `CycleSequence` by calling through to the wrapped 
>>>>>> collection, but that’ll seemingly evaporate as soon as your 
>>>>>> `CycleSequence` winds up hidden inside an `AnySequence`.
>>>>>  
>>>>> You can't override those anyway in a generic context, because they're not 
>>>>> members of the protocol, they're just extensions. You could implement 
>>>>> them such that your implementation is called when working on the concrete 
>>>>> CycleSequence type, but I'm not sure if that's a great idea to do that 
>>>>> when the actual behavior differs from calling it generically on 
>>>>> SequenceType (well, triggering a fatalError() instead of an infinite loop 
>>>>> is fine because they're both Bottom, but returning a valid result in one 
>>>>> context and looping infinitely in the other seems bad).
>>>>>  
>>>>> Of course, these methods could actually be moved into the protocol 
>>>>> itself, which would let us override them. I'm not entirely sure why they 
>>>>> aren't in the protocol to begin with, I guess because there's not much 
>>>>> need for overriding these outside of infinite sequences (well, finite 
>>>>> sorted sequences could provide an optimized min/maxElement, and an 
>>>>> optimized version of contains(_: Self.Generator.Element), but maybe 
>>>>> there's tradeoffs to doing this, e.g. maybe there's some reason why 
>>>>> having a large protocol witness table is a bad idea).
>>>>  
>>>> I don’t think `contains` or `minElement/maxElement` *can* be part of the 
>>>> protocol (in the sense of overridable) at this time (they require 
>>>> `Element` satisfy certain type constraints), but they certainly should be 
>>>> if the type system someday would support that.
>>>>  
>>>>>  
>>>>>> Which illustrates why this is a -1 for me; there's nothing wrong with 
>>>>>> the functionality in isolation and there’s nothing wrong with infinite 
>>>>>> sequences, but the standard library should play well with itself, and 
>>>>>> this wouldn’t play well with the rest of the standard library.
>>>>>  
>>>>> Ultimately, there's not much difference between an infinite sequence and 
>>>>> a sequence of Int.max elements. The latter is finite, but it's so massive 
>>>>> (especially on 64-bit) that any kind of eager processing is going to hit 
>>>>> the same problems as an infinite sequence. Every problem you describe 
>>>>> will be a problem with the simple sequence `(0..<Int.max)` as well.
>>>>>  
>>>>> -Kevin Ballard
>>>>>  
>>>>>> That opinion could change as the language changes or the standard 
>>>>>> library evolves.
>>>>>>  
>>>>>>> On Dec 28, 2015, at 1:20 AM, Kevin Ballard via swift-evolution 
>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>  
>>>>>>> ## Introduction
>>>>>>>  
>>>>>>> Add a new property `cycle` to CollectionType that returns an infinite 
>>>>>>> SequenceType that yields the elements of the collection in a loop.
>>>>>>>  
>>>>>>> ## Motivation
>>>>>>>  
>>>>>>> It's sometimes useful to be able to have an infinite sequence. For 
>>>>>>> example, `CollectionOfOne(x).cycle` could be used to have an infinite 
>>>>>>> sequence of a single element (similar to Repeat but without a count). A 
>>>>>>> common use for infinite sequences is zipping with a finite sequence. As 
>>>>>>> far as I'm aware, the stdlib does not currently provide any way to 
>>>>>>> create such an infinite sequence.
>>>>>>>  
>>>>>>> ## Proposed solution
>>>>>>>  
>>>>>>> Extend CollectionType with a new property `cycle` that yields a type 
>>>>>>> that conforms to SequenceType. This sequence yields each element of the 
>>>>>>> collection in an infinite loop.
>>>>>>>  
>>>>>>> ## Detailed design
>>>>>>>  
>>>>>>> 2 new types would be added:
>>>>>>>  
>>>>>>> struct CycleSequence<Base : CollectionType> : LazySequenceType { ... }
>>>>>>> struct CycleGenerator<Base : CollectionType> : GeneratorType { ... }
>>>>>>>  
>>>>>>> CollectionType would be extended with a property:
>>>>>>>  
>>>>>>> extension CollectionType {
>>>>>>>   public var cycle: CycleSequence<Self> { get }
>>>>>>> }
>>>>>>>  
>>>>>>> This is an extension of CollectionType instead of SequenceType because 
>>>>>>> it requires a multi-pass sequence (and SequenceType does not provide 
>>>>>>> that guarantee). The returned type conforms to SequenceType instead of 
>>>>>>> CollectionType because there is no possible `endIndex` that satisfies 
>>>>>>> the requirement of being reachable from `startIndex` by zero or more 
>>>>>>> applications of `successor()`.
>>>>>>>  
>>>>>>> Because the default eager versions of map and filter will execute 
>>>>>>> forever on an infinite sequence, CycleSequence conforms to 
>>>>>>> LazySequenceType instead of SequenceType in order to provide lazy 
>>>>>>> versions of those functions. Additionally, it will provide 
>>>>>>> implementations of the eager versions that simply trigger a 
>>>>>>> fatalError(), as the alternative is an infinite loop that consumes more 
>>>>>>> and more memory.
>>>>>>>  
>>>>>>> ## Impact on existing code
>>>>>>>  
>>>>>>> None
>>>>>>>  
>>>>>>> -Kevin Ballard
>>>>>>> _______________________________________________
>>>>>>> swift-evolution mailing list
>>>>>>> [email protected] <mailto:[email protected]>
>>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>>>>>  
>>>>>> _______________________________________________
>>>>>> swift-evolution mailing list
>>>>>> [email protected] <mailto:[email protected]>
>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>>>> _______________________________________________
>>>>> swift-evolution mailing list
>>>>> [email protected] <mailto:[email protected]>
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>>> 
>>>>  
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>> 
>>>  
>> 
>>  
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>  
> 
> _______________________________________________
> swift-evolution mailing list
> [email protected]
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] Proposal: CollectionType.cycle property for an infinite sequence

Reply via email to