Re: [swift-evolution] [Draft] Rename Sequence.elementsEqual

Xiaodi Wu via swift-evolution Tue, 17 Oct 2017 11:47:13 -0700

On Tue, Oct 17, 2017 at 12:54 Jonathan Hull <jh...@gbis.com> wrote:

> On Oct 17, 2017, at 9:34 AM, Xiaodi Wu <xiaodi...@gmail.com> wrote:
>
>
> On Tue, Oct 17, 2017 at 09:42 Jonathan Hull <jh...@gbis.com> wrote:
>
>> On Oct 17, 2017, at 5:44 AM, Xiaodi Wu <xiaodi...@gmail.com> wrote:
>>
>>
>> On Tue, Oct 17, 2017 at 00:56 Thorsten Seitz <tseit...@icloud.com> wrote:
>>
>>>
>>>
>>> Am 17.10.2017 um 00:13 schrieb Xiaodi Wu <xiaodi...@gmail.com>:
>>>
>>>
>>> On Mon, Oct 16, 2017 at 14:21 Thorsten Seitz <tseit...@icloud.com>
>>> wrote:
>>>
>>>> Am 16.10.2017 um 16:20 schrieb Xiaodi Wu via swift-evolution <
>>>> swift-evolution@swift.org>:
>>>>
>>>>
>>>> On Mon, Oct 16, 2017 at 05:48 Jonathan Hull <jh...@gbis.com> wrote:
>>>>
>>>>>
>>>>> On Oct 15, 2017, at 9:58 PM, Xiaodi Wu <xiaodi...@gmail.com> wrote:
>>>>>
>>>>> On Sun, Oct 15, 2017 at 8:51 PM, Jonathan Hull <jh...@gbis.com> wrote:
>>>>>
>>>>>>
>>>>>> On Oct 14, 2017, at 10:48 PM, Xiaodi Wu <xiaodi...@gmail.com> wrote:
>>>>>>
>>>>>>>  That ordering can be arbitrary, but it shouldn’t leak internal
>>>>>>>> representation such that the method used to create identical things 
>>>>>>>> affects
>>>>>>>> the outcome of generic methods because of differences in internal
>>>>>>>> representation.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  It would be better to say that the iteration order is
>>>>>>>>> well-defined. That will almost always mean documented, and usually
>>>>>>>>> predictable though obviously e.g. RNGs and iterating in random order 
>>>>>>>>> will
>>>>>>>>> not be predictable by design.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> That's actually more semantically constrained than what Swift
>>>>>>>>>> calls a `Collection` (which requires conforming types to be 
>>>>>>>>>> multi-pass
>>>>>>>>>> and(?) finite). By contrast, Swift's `SpongeBob` protocol explicitly
>>>>>>>>>> permits conforming single-pass, infinite, and/or unordered types.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think you’re talking about Sequence here, I’ve lost track of
>>>>>>>>>> your nonsense by now. Yes, the current Swift protocol named Sequence 
>>>>>>>>>> allows
>>>>>>>>>> unordered types. You seem to keep asserting that but not actually
>>>>>>>>>> addressing my argument, which is *that allowing Sequences to be
>>>>>>>>>> unordered with the current API is undesired and actively harmful, and
>>>>>>>>>> should* *therefore** be changed*.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What is harmful about it?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> After thinking about it, I think the harmful bit is that unordered
>>>>>>>>> sequences are leaking internal representation (In your example, this 
>>>>>>>>> is
>>>>>>>>> causing people to be surprised when two sets with identical elements 
>>>>>>>>> are
>>>>>>>>> generating different sequences/orderings based on how they were 
>>>>>>>>> created).
>>>>>>>>> You are correct when you say that this problem is even true for 
>>>>>>>>> for-in.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I would not say it is a problem. Rather, by definition, iteration
>>>>>>>> involves retrieving one element after another; if you're allowed to do 
>>>>>>>> that
>>>>>>>> with Set, then the elements of a Set are observably ordered in some 
>>>>>>>> way.
>>>>>>>> Since it's not an OrderedSet--i.e., order doesn't matter--then the only
>>>>>>>> sensible conclusion is that the order of elements obtained in a 
>>>>>>>> for...in
>>>>>>>> loop must be arbitrary. If you think this is harmful, then you must 
>>>>>>>> believe
>>>>>>>> that one should be prohibited from iterating over an instance of Set.
>>>>>>>> Otherwise, Set is inescapably a Sequence by the Swift definition of
>>>>>>>> Sequence. All extension methods on Sequence like drop(while:) are 
>>>>>>>> really
>>>>>>>> just conveniences for common things that you can do with iterated 
>>>>>>>> access;
>>>>>>>> to my mind, they're essentially just alternative ways of spelling 
>>>>>>>> various
>>>>>>>> for...in loops.
>>>>>>>>
>>>>>>>>
>>>>>>>> I think an argument could be made that you shouldn’t be able to
>>>>>>>> iterate over a set without first defining an ordering on it (even if 
>>>>>>>> that
>>>>>>>> ordering is somewhat arbitrary).  Maybe we have something like a
>>>>>>>> “Sequenc(e)able” protocol which defines things which can be turned 
>>>>>>>> into a
>>>>>>>> sequence when combined with some sort of ordering.  One possible 
>>>>>>>> ordering
>>>>>>>> could be the internal representation (At least in that case we are 
>>>>>>>> calling
>>>>>>>> it out specifically).  If I had to say
>>>>>>>> “setA.arbitraryOrder.elementsEqual(setB.arbitraryOrder)” I would 
>>>>>>>> definitely
>>>>>>>> be less surprised when it returns false even though setA == setB.
>>>>>>>>
>>>>>>>
>>>>>>> Well, that's a totally different direction, then; you're arguing
>>>>>>> that `Set` and `Dictionary` should not conform to `Sequence` altogether.
>>>>>>> That's fine (it's also a direction that some of us explored off-list a
>>>>>>> while ago), but at this point in Swift's evolution, realistically, it's 
>>>>>>> not
>>>>>>> within the realm of possible changes.
>>>>>>>
>>>>>>>
>>>>>>> I am actually suggesting something slightly different.  Basically,
>>>>>>> Set and Dictionary’s conformance to Collection would have a different
>>>>>>> implementation.  They would conform to another protocol declaring that 
>>>>>>> they
>>>>>>> are unordered. That protocol would fill in part of the conformance to
>>>>>>> sequence/collection using a default ordering, which is mostly arbitrary,
>>>>>>> but guaranteed to produce the same ordering for the same list of 
>>>>>>> elements
>>>>>>> (even across collection types).  This would be safer, but a tiny bit 
>>>>>>> slower
>>>>>>> than what we have now (We could also potentially develop a way for
>>>>>>> collections like set to amortize the cost). For those who need to 
>>>>>>> recover
>>>>>>> speed, the new protocol would also define a property which quickly 
>>>>>>> returns
>>>>>>> a sequence/iterator using the internal ordering (I arbitrarily called it
>>>>>>> .arbitraryOrder).
>>>>>>>
>>>>>>> I believe it would not be source breaking.
>>>>>>>
>>>>>>
>>>>>> That is indeed something slightly different.
>>>>>>
>>>>>> In an ideal world--and my initial understanding of what you were
>>>>>> suggesting--Set and Dictionary would each have a member like 
>>>>>> `collection`,
>>>>>> which would expose the underlying data as a `SetCollection` or
>>>>>> `DictionaryCollection` that in turn would conform to `Collection`;
>>>>>> meanwhile, Set and Dictionary themselves would not offer methods such as
>>>>>> `prefix`, or indexing by subscript, which are not compatible with being
>>>>>> unordered. For those who want a particular ordering, there'd be something
>>>>>> like `collection(ordered areInIncreasingOrder: (T, T) -> Bool) ->
>>>>>> {Set|Dictionary}Collection`.
>>>>>>
>>>>>> What you suggest here instead would be minimally source-breaking.
>>>>>> However, I'm unsure of where these guarantees provide benefit to justify
>>>>>> the performance cost. Certainly not for `first` or `dropFirst(_:)`, which
>>>>>> still yields an arbitrary result which doesn't make sense for something
>>>>>> _unordered_. We *could* have an underscored customization point named
>>>>>> something like `_customOrderingPass` that is only invoked from
>>>>>> `elementsEqual` or other such methods to pre-rearrange the internal
>>>>>> ordering of unordered collections in some deterministic way before
>>>>>> comparison. Is that what you have in mind?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Something like that.  Whatever we do, there will be a tradeoff
>>>>>> between speed, correctness, and ergonomics.
>>>>>>
>>>>>> My suggestion trades speed for correctness, and provides a way to
>>>>>> recover speed through additional typing (which is slightly less 
>>>>>> ergonomic).
>>>>>>
>>>>>
>>>>> You haven't convinced me that this is at all improved in
>>>>> "correctness." It trades one arbitrary iteration order for another on a
>>>>> type that tries to model an unordered collection.
>>>>>
>>>>>
>>>>>> We could do something like you suggest. I don’t think the method
>>>>>> would need to be underscored… the ordering pass could just be a method on
>>>>>> the protocol which defines it as unordered.  Then we could provide a
>>>>>> special conformance for things where order really matters based on
>>>>>> adherence to that protocol.  That might be an acceptable tradeoff.  It
>>>>>> would give us speed at the cost of having the correct implementation 
>>>>>> being
>>>>>> less ergonomic and more error prone (you have to remember to check that 
>>>>>> it
>>>>>> is unordered and call the ordering method when it mattered).
>>>>>>
>>>>>> I’d still be a bit worried that people would make incorrect generic
>>>>>> algorithms based on expecting an order from unordered things, but at 
>>>>>> least
>>>>>> it would be possible for them check and handle it correctly.  I think I
>>>>>> could get behind that tradeoff/compromise, given where we are in the 
>>>>>> swift
>>>>>> process and Swift's obsession with speed (though I still slightly prefer
>>>>>> the safer default).  At least the standard library would handle all the
>>>>>> things correctly, and that is what will affect the majority of 
>>>>>> programmers.
>>>>>>
>>>>>
>>>>> What is an example of such an "incorrect" generic algorithm that would
>>>>> be made correct by such a scheme?
>>>>>
>>>>>
>>>>> To start with, the one you gave as an example at the beginning of this
>>>>> discussion: Two sets with identical elements which have different internal
>>>>> storage and thus give different orderings as sequences.  You yourself have
>>>>> argued that the confusion around this is enough of a problem that we need
>>>>> to make a source-breaking change (renaming it) to warn people that the
>>>>> results of the ‘elementsEqual’ algorithm are undefined for sets and
>>>>> dictionaries.
>>>>>
>>>>
>>>> No, I am arguing that the confusion about ‘elementsEqual’ is foremost a
>>>> problem with its name; the result of this operation is not at all undefined
>>>> for two sets but actually clearly defined: it returns true if two sets have
>>>> the same elements in the same iteration order, which is a publicly
>>>> observable behavior of sets (likewise dictionaries).
>>>>
>>>>
>>>> But it is a behavior which has absolutely no meaning at all because the
>>>> order does not depend on the elements of the set but on the history of how
>>>> the set has been reached its current state.
>>>> So why should I ever use this method on a set?
>>>> What is the use case?
>>>>
>>>
>>> One example: you can use it to check an instance of Set<Float> to
>>> determine if it has a NaN value. (The “obvious” way of doing it is not
>>> guaranteed to work since NaN != NaN.)
>>>
>>>
>>> How would I do that? I'd rather expect to use a property isNaN on Float
>>> to do that.
>>>
>>
>> set.elementsEqual(set)
>>
>>
>> I see why that would work (thanks to Set being a collection vs a
>> sequence), but it still feels like a hack.  I definitely wouldn’t want to
>> be maintaining code with that in it. Especially when compared to something
>> like:
>>
>> set.contains(where: {$0.isNaN})
>>
>
> The purpose of protocols is to enable useful generic code. You cannot use
> isNaN for code that works on generic Collection, or for that matter, even
> code that works on Set where Element : Numeric.
>
> Much generic code (for example, generic sorting) easily blows up when
> encountering NaN. If you want an algorithm that is robust enough to handle
> (or at least reject) that scenario, then you will need to check for it. It
> is not a “hack” but a very common and accepted way of determining the
> presence of NaN.
>
>
> Just because a hack is commonly accepted or common doesn’t mean it isn’t a
> hack.
>


It’s not a “hack.” NaN is required by IEEE fiat not to be equivalent to
itself. Asking whether a value is reflexively equivalent to itself is
guaranteed to be sufficient to detect the presence of NaN in all
IEEE-compliant contexts, which is to say all past, present, and future
versions of Swift.

Using elementsEqual in this way (in a generic context which may or may not
> involve floats) is definitely a hack, and is likely to bite you at some
> point.
>

How is it definitely a hack when it is blessed by IEEE, and how can it bite
me?

I don’t see why a non-source-breaking change is suddenly off-limits.
>>>>>
>>>>> But more than that, any generic algorithm which is assuming that the
>>>>> sequence is coming from an ordered source (i.e. many things using
>>>>> first/last).  Some uses of first are ok because the programmer actually
>>>>> means ‘any’, but anywhere where they actually mean first/last may be
>>>>> problematic.
>>>>>
>>>>
>>>> Such as...?
>>>>
>>>> Currently, there is no way to test for ordered-ness, so there is no way
>>>>> for even a careful programmer to mitigate this problem.  By adding a
>>>>> protocol which states that something is unordered, we can either branch on
>>>>> it, or create a separate version of an algorithm for things which conform.
>>>>>
>>>>
>>>> It is clearly the case that Swift’s protocol hierarchy fits sets and
>>>> collections imperfectly; however, it is in the nature of modeling that
>>>> imperfections are present. The question is not whether it is possible to
>>>> incur performance, API surface area, and other trade-offs to make the model
>>>> more faithful, but rather whether this usefully solves any problem. What is
>>>> the problem being mitigated? As I write above, Swift’s Set and Dictionary
>>>> types meet the semantic requirements for Collection and moonlight as
>>>> ordered collections. What is a generic algorithm on an ordered collection
>>>> that is  “not OK” for Set and Dictionary? (“elementsEqual”, as I’ve said,
>>>> is not such an example.)
>>>>
>>>>
>>>> On the contrary, `elementsEqual` is exactly such an example, because it
>>>> makes no sense to use it on a Set.
>>>>
>>>> let s1 = Set([1,2,3,4,5,6])
>>>> let s2 = Set([6,5,4,3,2,1])
>>>>
>>>> Both sets have different iteration orders. Comparing those sets with
>>>> some other collection using `elementsEqual` will give no meaningful result
>>>> because the order - and therefore the result of `elementsEqual` - is in
>>>> effect random.
>>>>
>>>
>>> No, it is not such an example; it’s misleadingly named but works
>>> correctly—that is, its behavior matches exactly the documented behavior,
>>> which relies on only the semantic guarantees of Sequence, which Set
>>> correctly fulfills.
>>>
>>>
>>> Fulfills to the letter. Again, what can you do with it if the result is
>>> random??
>>>
>>
>> The result is not random.
>>
>>
>> It is undefined though.  As you said earlier, by the guarantees we have
>> been given, it may shift over different builds/runs of a program.  Thus in
>> one run, it might return true and then false in another (without changing
>> our code).  As Greg pointed out, it is also possible with the guarantees we
>> are given, for set/dict to have different orderings with copies of
>> themselves. (It will happen for sure when deep copying a dictionary with
>> reference-type keys).
>>
>
> As I wrote to Greg, if that is a possibility for Set, then the
> implementation is problematic for conformance to Collection and needs to be
> re-examined.
>
>
> So why not fix both things at once with a single change?
>
>
> I’m not sure why you claim the order of a dictionary will definitiely
> change on deep copying. But in any case, we are not talking about deep
> copying here, or at least, I’m not; we’re talking about the notional
> copying involved in the semantics of passing a set as an argument.
>
>
> Because the ordering of some dictionaries depends on the memory address of
> the key.  There is no way to copy the key in those cases without changing
> the ordering.  You can watch their ordering shift in the debugger from run
> to run.
>

I’m fine with that. Instances of reference types have identity. A deep copy
of a dictionary *has different keys* from the original. This may not be so
from the perspective of a custom == operator, but it most certainly is so
from the perspective of reference type semantics. So to me it is fine
(correct, even?) that a deep copy of a dictionary has a different iteration
order—it has different keys.

Again, this is something entirely different from notional copies made in
the course of passing values as arguments.

As I keep saying, relying on the behavior of a leaking internal
>> implementation is a bad plan.
>>
>
> Again, it is not a leaking internal implementation. It is a public API
> guarantee.
>
>
> The public API guarantee is that there will be some ordering that is
> stable for a particular instance on a particular run (as long as it is not
> mutated/copied).  The actual ordering is undefined.
>
> Any generic code which depends on a particular ordering will have output
> which is undefined.
>

Generic code is incorrect that “depends on a particular ordering” because
it assumes semantics that are not guaranteed.

Just because a function returns a seemingly predictable value doesn’t mean
> it is entirely deterministic.  There are lots of vulnerabilities in C/C++
> due to reliance on undefined behavior.
>
>
> We should add an additional guarantee to set/dict that the order returned
>> will be the same for the same contents regardless of history (but can be
>> otherwise arbitrary).  That will fix the behavior for algorithms like
>> elementsEqual (i.e. They will return the same result across builds/runs).
>> It will also implicitly provide as a result, the constraint you were
>> arguing is needed across copies of a collection type.  I agree that that is
>> an important guarantee.  Why not fix both issues with a single
>> non-source-breaking change?
>>
>
> As I wrote, the behavior of elementsEqual is not broken and does not
> require fixing.
>
>
> Your argument for that is tautological though.  You could argue for
> keeping any bug in the codebase using the same reasoning.
>
> Why was elementsEqual created?  Isn’t it meant to check equality of two
> sequences in a generic context where == isn’t available?
>

No no no no no no no no. That’s precisely why the name is misleading.
elementsEqual is *not* simply a mixed-type version of ==. Remember that ==
as implemented on a concrete sequence type *has no obligation* to use
elementwise comparison using the element’s implementation of ==. This is
not merely theoretical: [Float].== does *not* do an elementwise comparison
using Float.==. By contrast, you are guaranteed a true elementwise
comparison with elementsEqual regardless of how equivalence is defined for
the sequence.

Isn’t it problematic that two equivalent sets may not be equivalent in a
> generic context?
>

Again, this shows why the name is unfortunate. elementsEqual is not
supposed to be a generic version of ==, and its name is misleading. Repeat:
elementsEqual is *not* an alternative for == in either the generic or
concrete context. Neither one implies the other.

What is the benefit of the guarantee you propose that justifies the
> performance cost?
>
>
> The benefit is that generic algorithms which rely on ordering of unordered
> things would be much more robust.  It would solve your copy issue as well.
>
> You are acting like there is a huge performance cost. There isn’t. For
> Set, it would still have the traditional performance characteristics for
> Sets. Any slowdown (by un-cutting corners) would be amortized over
> insertions. (note: I am talking option #3 from my summary here, not #5,
> which was the one with an actual performance cost)
>
>
> Why is the source-breaking change of renaming things better?
>>
>
> Because, in my analysis, the problem is that the method is incorrectly
> named. The problem affects all types that conform to Sequence and not just
> Set and Dictionary; elementsEqual is a distinct function from ==, and it
> must either continue to be distinct or cease to exist, but its name does
> nothing to clarify any distinction.
>
>
> It also won’t fix the original problem you mention as motivation.  People
> will still be surprised at equivalent sets having different iteration
> orders regardless of what the function is named.
>

That is not the motivating problem. The motivating problem is that people
think that elementsEqual is semantically equivalent to ==. It is a distinct
method that *does not test* equivalence of two sequences.

Thanks,
> Jon
>
>
>

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Draft] Rename Sequence.elementsEqual

Reply via email to