On Tue, Oct 17, 2017 at 14:41 Jonathan Hull <jh...@gbis.com> wrote: > On Oct 17, 2017, at 11:46 AM, Xiaodi Wu <xiaodi...@gmail.com> wrote: > > > On Tue, Oct 17, 2017 at 12:54 Jonathan Hull <jh...@gbis.com> wrote: > >> On Oct 17, 2017, at 9:34 AM, Xiaodi Wu <xiaodi...@gmail.com> wrote: >> >> >> On Tue, Oct 17, 2017 at 09:42 Jonathan Hull <jh...@gbis.com> wrote: >> >>> On Oct 17, 2017, at 5:44 AM, Xiaodi Wu <xiaodi...@gmail.com> wrote: >>> >>> >>> On Tue, Oct 17, 2017 at 00:56 Thorsten Seitz <tseit...@icloud.com> >>> wrote: >>> >>>> >>>> >>>> Am 17.10.2017 um 00:13 schrieb Xiaodi Wu <xiaodi...@gmail.com>: >>>> >>>> >>>> On Mon, Oct 16, 2017 at 14:21 Thorsten Seitz <tseit...@icloud.com> >>>> wrote: >>>> >>>>> Am 16.10.2017 um 16:20 schrieb Xiaodi Wu via swift-evolution < >>>>> swift-evolution@swift.org>: >>>>> >>>>> >>>>> On Mon, Oct 16, 2017 at 05:48 Jonathan Hull <jh...@gbis.com> wrote: >>>>> >>>>>> >>>>>> On Oct 15, 2017, at 9:58 PM, Xiaodi Wu <xiaodi...@gmail.com> wrote: >>>>>> >>>>>> On Sun, Oct 15, 2017 at 8:51 PM, Jonathan Hull <jh...@gbis.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> On Oct 14, 2017, at 10:48 PM, Xiaodi Wu <xiaodi...@gmail.com> wrote: >>>>>>> >>>>>>>> That ordering can be arbitrary, but it shouldn’t leak internal >>>>>>>>> representation such that the method used to create identical things >>>>>>>>> affects >>>>>>>>> the outcome of generic methods because of differences in internal >>>>>>>>> representation. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It would be better to say that the iteration order is >>>>>>>>>> well-defined. That will almost always mean documented, and usually >>>>>>>>>> predictable though obviously e.g. RNGs and iterating in random order >>>>>>>>>> will >>>>>>>>>> not be predictable by design. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> That's actually more semantically constrained than what Swift >>>>>>>>>>> calls a `Collection` (which requires conforming types to be >>>>>>>>>>> multi-pass >>>>>>>>>>> and(?) finite). By contrast, Swift's `SpongeBob` protocol explicitly >>>>>>>>>>> permits conforming single-pass, infinite, and/or unordered types. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think you’re talking about Sequence here, I’ve lost track of >>>>>>>>>>> your nonsense by now. Yes, the current Swift protocol named >>>>>>>>>>> Sequence allows >>>>>>>>>>> unordered types. You seem to keep asserting that but not actually >>>>>>>>>>> addressing my argument, which is *that allowing Sequences to be >>>>>>>>>>> unordered with the current API is undesired and actively harmful, >>>>>>>>>>> and >>>>>>>>>>> should* *therefore** be changed*. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> What is harmful about it? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> After thinking about it, I think the harmful bit is that >>>>>>>>>> unordered sequences are leaking internal representation (In your >>>>>>>>>> example, >>>>>>>>>> this is causing people to be surprised when two sets with identical >>>>>>>>>> elements are generating different sequences/orderings based on how >>>>>>>>>> they >>>>>>>>>> were created). You are correct when you say that this problem is >>>>>>>>>> even true >>>>>>>>>> for for-in. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I would not say it is a problem. Rather, by definition, iteration >>>>>>>>> involves retrieving one element after another; if you're allowed to >>>>>>>>> do that >>>>>>>>> with Set, then the elements of a Set are observably ordered in some >>>>>>>>> way. >>>>>>>>> Since it's not an OrderedSet--i.e., order doesn't matter--then the >>>>>>>>> only >>>>>>>>> sensible conclusion is that the order of elements obtained in a >>>>>>>>> for...in >>>>>>>>> loop must be arbitrary. If you think this is harmful, then you must >>>>>>>>> believe >>>>>>>>> that one should be prohibited from iterating over an instance of Set. >>>>>>>>> Otherwise, Set is inescapably a Sequence by the Swift definition of >>>>>>>>> Sequence. All extension methods on Sequence like drop(while:) are >>>>>>>>> really >>>>>>>>> just conveniences for common things that you can do with iterated >>>>>>>>> access; >>>>>>>>> to my mind, they're essentially just alternative ways of spelling >>>>>>>>> various >>>>>>>>> for...in loops. >>>>>>>>> >>>>>>>>> >>>>>>>>> I think an argument could be made that you shouldn’t be able to >>>>>>>>> iterate over a set without first defining an ordering on it (even if >>>>>>>>> that >>>>>>>>> ordering is somewhat arbitrary). Maybe we have something like a >>>>>>>>> “Sequenc(e)able” protocol which defines things which can be turned >>>>>>>>> into a >>>>>>>>> sequence when combined with some sort of ordering. One possible >>>>>>>>> ordering >>>>>>>>> could be the internal representation (At least in that case we are >>>>>>>>> calling >>>>>>>>> it out specifically). If I had to say >>>>>>>>> “setA.arbitraryOrder.elementsEqual(setB.arbitraryOrder)” I would >>>>>>>>> definitely >>>>>>>>> be less surprised when it returns false even though setA == setB. >>>>>>>>> >>>>>>>> >>>>>>>> Well, that's a totally different direction, then; you're arguing >>>>>>>> that `Set` and `Dictionary` should not conform to `Sequence` >>>>>>>> altogether. >>>>>>>> That's fine (it's also a direction that some of us explored off-list a >>>>>>>> while ago), but at this point in Swift's evolution, realistically, >>>>>>>> it's not >>>>>>>> within the realm of possible changes. >>>>>>>> >>>>>>>> >>>>>>>> I am actually suggesting something slightly different. Basically, >>>>>>>> Set and Dictionary’s conformance to Collection would have a different >>>>>>>> implementation. They would conform to another protocol declaring that >>>>>>>> they >>>>>>>> are unordered. That protocol would fill in part of the conformance to >>>>>>>> sequence/collection using a default ordering, which is mostly >>>>>>>> arbitrary, >>>>>>>> but guaranteed to produce the same ordering for the same list of >>>>>>>> elements >>>>>>>> (even across collection types). This would be safer, but a tiny bit >>>>>>>> slower >>>>>>>> than what we have now (We could also potentially develop a way for >>>>>>>> collections like set to amortize the cost). For those who need to >>>>>>>> recover >>>>>>>> speed, the new protocol would also define a property which quickly >>>>>>>> returns >>>>>>>> a sequence/iterator using the internal ordering (I arbitrarily called >>>>>>>> it >>>>>>>> .arbitraryOrder). >>>>>>>> >>>>>>>> I believe it would not be source breaking. >>>>>>>> >>>>>>> >>>>>>> That is indeed something slightly different. >>>>>>> >>>>>>> In an ideal world--and my initial understanding of what you were >>>>>>> suggesting--Set and Dictionary would each have a member like >>>>>>> `collection`, >>>>>>> which would expose the underlying data as a `SetCollection` or >>>>>>> `DictionaryCollection` that in turn would conform to `Collection`; >>>>>>> meanwhile, Set and Dictionary themselves would not offer methods such as >>>>>>> `prefix`, or indexing by subscript, which are not compatible with being >>>>>>> unordered. For those who want a particular ordering, there'd be >>>>>>> something >>>>>>> like `collection(ordered areInIncreasingOrder: (T, T) -> Bool) -> >>>>>>> {Set|Dictionary}Collection`. >>>>>>> >>>>>>> What you suggest here instead would be minimally source-breaking. >>>>>>> However, I'm unsure of where these guarantees provide benefit to justify >>>>>>> the performance cost. Certainly not for `first` or `dropFirst(_:)`, >>>>>>> which >>>>>>> still yields an arbitrary result which doesn't make sense for something >>>>>>> _unordered_. We *could* have an underscored customization point named >>>>>>> something like `_customOrderingPass` that is only invoked from >>>>>>> `elementsEqual` or other such methods to pre-rearrange the internal >>>>>>> ordering of unordered collections in some deterministic way before >>>>>>> comparison. Is that what you have in mind? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Something like that. Whatever we do, there will be a tradeoff >>>>>>> between speed, correctness, and ergonomics. >>>>>>> >>>>>>> My suggestion trades speed for correctness, and provides a way to >>>>>>> recover speed through additional typing (which is slightly less >>>>>>> ergonomic). >>>>>>> >>>>>> >>>>>> You haven't convinced me that this is at all improved in >>>>>> "correctness." It trades one arbitrary iteration order for another on a >>>>>> type that tries to model an unordered collection. >>>>>> >>>>>> >>>>>>> We could do something like you suggest. I don’t think the method >>>>>>> would need to be underscored… the ordering pass could just be a method >>>>>>> on >>>>>>> the protocol which defines it as unordered. Then we could provide a >>>>>>> special conformance for things where order really matters based on >>>>>>> adherence to that protocol. That might be an acceptable tradeoff. It >>>>>>> would give us speed at the cost of having the correct implementation >>>>>>> being >>>>>>> less ergonomic and more error prone (you have to remember to check that >>>>>>> it >>>>>>> is unordered and call the ordering method when it mattered). >>>>>>> >>>>>>> I’d still be a bit worried that people would make incorrect generic >>>>>>> algorithms based on expecting an order from unordered things, but at >>>>>>> least >>>>>>> it would be possible for them check and handle it correctly. I think I >>>>>>> could get behind that tradeoff/compromise, given where we are in the >>>>>>> swift >>>>>>> process and Swift's obsession with speed (though I still slightly prefer >>>>>>> the safer default). At least the standard library would handle all the >>>>>>> things correctly, and that is what will affect the majority of >>>>>>> programmers. >>>>>>> >>>>>> >>>>>> What is an example of such an "incorrect" generic algorithm that >>>>>> would be made correct by such a scheme? >>>>>> >>>>>> >>>>>> To start with, the one you gave as an example at the beginning of >>>>>> this discussion: Two sets with identical elements which have different >>>>>> internal storage and thus give different orderings as sequences. You >>>>>> yourself have argued that the confusion around this is enough of a >>>>>> problem >>>>>> that we need to make a source-breaking change (renaming it) to warn >>>>>> people >>>>>> that the results of the ‘elementsEqual’ algorithm are undefined for sets >>>>>> and dictionaries. >>>>>> >>>>> >>>>> No, I am arguing that the confusion about ‘elementsEqual’ is foremost >>>>> a problem with its name; the result of this operation is not at all >>>>> undefined for two sets but actually clearly defined: it returns true if >>>>> two >>>>> sets have the same elements in the same iteration order, which is a >>>>> publicly observable behavior of sets (likewise dictionaries). >>>>> >>>>> >>>>> But it is a behavior which has absolutely no meaning at all because >>>>> the order does not depend on the elements of the set but on the history of >>>>> how the set has been reached its current state. >>>>> So why should I ever use this method on a set? >>>>> What is the use case? >>>>> >>>> >>>> One example: you can use it to check an instance of Set<Float> to >>>> determine if it has a NaN value. (The “obvious” way of doing it is not >>>> guaranteed to work since NaN != NaN.) >>>> >>>> >>>> How would I do that? I'd rather expect to use a property isNaN on Float >>>> to do that. >>>> >>> >>> set.elementsEqual(set) >>> >>> >>> I see why that would work (thanks to Set being a collection vs a >>> sequence), but it still feels like a hack. I definitely wouldn’t want to >>> be maintaining code with that in it. Especially when compared to something >>> like: >>> >>> set.contains(where: {$0.isNaN}) >>> >> >> The purpose of protocols is to enable useful generic code. You cannot use >> isNaN for code that works on generic Collection, or for that matter, even >> code that works on Set where Element : Numeric. >> >> Much generic code (for example, generic sorting) easily blows up when >> encountering NaN. If you want an algorithm that is robust enough to handle >> (or at least reject) that scenario, then you will need to check for it. It >> is not a “hack” but a very common and accepted way of determining the >> presence of NaN. >> >> >> Just because a hack is commonly accepted or common doesn’t mean it isn’t >> a hack. >> > > It’s not a “hack.” NaN is required by IEEE fiat not to be equivalent to > itself. Asking whether a value is reflexively equivalent to itself is > guaranteed to be sufficient to detect the presence of NaN in all > IEEE-compliant contexts, which is to say all past, present, and future > versions of Swift. > > > The problem isn’t Nan != NaN, but the fact that you are comparing T != T > (where T may not even be a floating point thing) and assuming it means NaN. >
I’m not assuming that at all; the example I gave was the use case of generic sorting, where NaN is problematic *because* NaN != NaN. In a generic context, I would want to detect the case where *any value*, NaN or not, is not equivalent to itself. NaN is just a good example, because it is a value of a built-in type that exhibits this behavior. Using elementsEqual in this way (in a generic context which may or may not >> involve floats) is definitely a hack, and is likely to bite you at some >> point. >> > > How is it definitely a hack when it is blessed by IEEE, and how can it > bite me? > > > Because it may return false for reasons other than NaN. > Which, again, is fine. If I cared about NaN specifically, then I must write a generic algorithm that’s constrained to floating point. I don’t see why a non-source-breaking change is suddenly off-limits. >>>>>> >>>>>> But more than that, any generic algorithm which is assuming that the >>>>>> sequence is coming from an ordered source (i.e. many things using >>>>>> first/last). Some uses of first are ok because the programmer actually >>>>>> means ‘any’, but anywhere where they actually mean first/last may be >>>>>> problematic. >>>>>> >>>>> >>>>> Such as...? >>>>> >>>>> Currently, there is no way to test for ordered-ness, so there is no >>>>>> way for even a careful programmer to mitigate this problem. By adding a >>>>>> protocol which states that something is unordered, we can either branch >>>>>> on >>>>>> it, or create a separate version of an algorithm for things which >>>>>> conform. >>>>>> >>>>> >>>>> It is clearly the case that Swift’s protocol hierarchy fits sets and >>>>> collections imperfectly; however, it is in the nature of modeling that >>>>> imperfections are present. The question is not whether it is possible to >>>>> incur performance, API surface area, and other trade-offs to make the >>>>> model >>>>> more faithful, but rather whether this usefully solves any problem. What >>>>> is >>>>> the problem being mitigated? As I write above, Swift’s Set and Dictionary >>>>> types meet the semantic requirements for Collection and moonlight as >>>>> ordered collections. What is a generic algorithm on an ordered collection >>>>> that is “not OK” for Set and Dictionary? (“elementsEqual”, as I’ve said, >>>>> is not such an example.) >>>>> >>>>> >>>>> On the contrary, `elementsEqual` is exactly such an example, because >>>>> it makes no sense to use it on a Set. >>>>> >>>>> let s1 = Set([1,2,3,4,5,6]) >>>>> let s2 = Set([6,5,4,3,2,1]) >>>>> >>>>> Both sets have different iteration orders. Comparing those sets with >>>>> some other collection using `elementsEqual` will give no meaningful result >>>>> because the order - and therefore the result of `elementsEqual` - is in >>>>> effect random. >>>>> >>>> >>>> No, it is not such an example; it’s misleadingly named but works >>>> correctly—that is, its behavior matches exactly the documented behavior, >>>> which relies on only the semantic guarantees of Sequence, which Set >>>> correctly fulfills. >>>> >>>> >>>> Fulfills to the letter. Again, what can you do with it if the result is >>>> random?? >>>> >>> >>> The result is not random. >>> >>> >>> It is undefined though. As you said earlier, by the guarantees we have >>> been given, it may shift over different builds/runs of a program. Thus in >>> one run, it might return true and then false in another (without changing >>> our code). As Greg pointed out, it is also possible with the guarantees we >>> are given, for set/dict to have different orderings with copies of >>> themselves. (It will happen for sure when deep copying a dictionary with >>> reference-type keys). >>> >> >> As I wrote to Greg, if that is a possibility for Set, then the >> implementation is problematic for conformance to Collection and needs to be >> re-examined. >> >> >> So why not fix both things at once with a single change? >> >> >> I’m not sure why you claim the order of a dictionary will definitiely >> change on deep copying. But in any case, we are not talking about deep >> copying here, or at least, I’m not; we’re talking about the notional >> copying involved in the semantics of passing a set as an argument. >> >> >> Because the ordering of some dictionaries depends on the memory address >> of the key. There is no way to copy the key in those cases without >> changing the ordering. You can watch their ordering shift in the debugger >> from run to run. >> > > I’m fine with that. Instances of reference types have identity. A deep > copy of a dictionary *has different keys* from the original. This may not > be so from the perspective of a custom == operator, but it most certainly > is so from the perspective of reference type semantics. So to me it is fine > (correct, even?) that a deep copy of a dictionary has a different iteration > order—it has different keys. > > Again, this is something entirely different from notional copies made in > the course of passing values as arguments. > > As I keep saying, relying on the behavior of a leaking internal >>> implementation is a bad plan. >>> >> >> Again, it is not a leaking internal implementation. It is a public API >> guarantee. >> >> >> The public API guarantee is that there will be some ordering that is >> stable for a particular instance on a particular run (as long as it is not >> mutated/copied). The actual ordering is undefined. >> >> Any generic code which depends on a particular ordering will have output >> which is undefined. >> > > Generic code is incorrect that “depends on a particular ordering” because > it assumes semantics that are not guaranteed. > > Just because a function returns a seemingly predictable value doesn’t mean >> it is entirely deterministic. There are lots of vulnerabilities in C/C++ >> due to reliance on undefined behavior. >> >> >> We should add an additional guarantee to set/dict that the order returned >>> will be the same for the same contents regardless of history (but can be >>> otherwise arbitrary). That will fix the behavior for algorithms like >>> elementsEqual (i.e. They will return the same result across builds/runs). >>> It will also implicitly provide as a result, the constraint you were >>> arguing is needed across copies of a collection type. I agree that that is >>> an important guarantee. Why not fix both issues with a single >>> non-source-breaking change? >>> >> >> As I wrote, the behavior of elementsEqual is not broken and does not >> require fixing. >> >> >> Your argument for that is tautological though. You could argue for >> keeping any bug in the codebase using the same reasoning. >> >> Why was elementsEqual created? Isn’t it meant to check equality of two >> sequences in a generic context where == isn’t available? >> > > No no no no no no no no. That’s precisely why the name is misleading. > elementsEqual is *not* simply a mixed-type version of ==. Remember that == > as implemented on a concrete sequence type *has no obligation* to use > elementwise comparison using the element’s implementation of ==. This is > not merely theoretical: [Float].== does *not* do an elementwise comparison > using Float.==. By contrast, you are guaranteed a true elementwise > comparison with elementsEqual regardless of how equivalence is defined for > the sequence. > > > elementsEqual shouldn’t fully imply == of the original type (only that the > elements are “equivalent in the same order” which is what it means for > generic sequences to be equivalent). > > But ‘==‘ does NEED to imply elementsEqual. Otherwise you break the > meaning of ‘==' > No, this is not required. For example: +0.0 == -0.0 but +0.0.sign != -0.0.sign Equivalence for the purposes of “==“ does *not* require substitutability for the purposes of all public APIs. The conforming type gets to decide what aspects are “salient.” Again, imperfections of modeling and all that. > > Isn’t it problematic that two equivalent sets may not be equivalent in a >> generic context? >> > > Again, this shows why the name is unfortunate. elementsEqual is not > supposed to be a generic version of ==, and its name is misleading. Repeat: > elementsEqual is *not* an alternative for == in either the generic or > concrete context. Neither one implies the other. > > > What are you basing this very strong belief on? Has the core team stated > that elementsEqual should not be used to check the equality of sequences? > See Michael Ilseman’s reply for more information on the difference between the two. > elementsEqual can only provide a notion of equivalence within the generic > context, of course (and not equivalence of the original types overall), but > it is still a notion of equivalence within the context. > No, it is **not** a notion of equivalence of the sequence type, but of the elements and their iteration order. Again, elementsEqual is poorly named because **it is not a test of equivalence of the receiver to the argument in any context whatsoever.** As I said above (and in another email), ‘==‘ has to imply elementsEqual. > Otherwise you break substitutability. > Again, substitutability for the purposes of == does *not* require substitutability with respect to all public APIs. What is the benefit of the guarantee you propose that justifies the >> performance cost? >> >> >> The benefit is that generic algorithms which rely on ordering of >> unordered things would be much more robust. It would solve your copy issue >> as well. >> >> You are acting like there is a huge performance cost. There isn’t. For >> Set, it would still have the traditional performance characteristics for >> Sets. Any slowdown (by un-cutting corners) would be amortized over >> insertions. (note: I am talking option #3 from my summary here, not #5, >> which was the one with an actual performance cost) >> > > I see you skipped responding to this. > It’s kind of beyond the scope of the discussion. What’s not disputed is that there is a performance cost. The question as to whether the degree of performance hit can be justified by the improvement in how “robust” the order-dependent methods on Set become presumes that it is at all a goal to make them “robust” in any way; I disagree. The reason Set has order-dependent methods is that it is a necessary consequence of supporting sequential iteration in a protocol-oriented way. It is currently documented that a user should choose Set only if they explicitly do not care about the order of iteration. I see no need to parse degrees of not caring. Why is the source-breaking change of renaming things better? >>> >> >> Because, in my analysis, the problem is that the method is incorrectly >> named. The problem affects all types that conform to Sequence and not just >> Set and Dictionary; elementsEqual is a distinct function from ==, and it >> must either continue to be distinct or cease to exist, but its name does >> nothing to clarify any distinction. >> >> >> It also won’t fix the original problem you mention as motivation. People >> will still be surprised at equivalent sets having different iteration >> orders regardless of what the function is named. >> > > That is not the motivating problem. The motivating problem is that people > think that elementsEqual is semantically equivalent to ==. It is a distinct > method that *does not test* equivalence of two sequences. > > > The motivating problem you originally gave was that people are surprised > when Set([1,2,3]).elementsEqual(Set[3,2,1]) returns false. You believe it > is because they don’t get that elementsEqual works sequentially (on > Sequence), > No, I believe it is because people believe that it is a test of equivalence of two sets, which it is not. and I say it is because people expect equivalent sets to produce equivalent > element orders. > The documentation for Set makes it clear that users of the type should have no such expectation. > Thanks, > Jon > > >
_______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution