On Wed, Oct 25, 2017 at 1:24 AM, David Sweeris <daveswee...@mac.com> wrote:
> > On Oct 24, 2017, at 9:06 PM, Xiaodi Wu via swift-dev <swift-dev@swift.org> > wrote: > > On Tue, Oct 24, 2017 at 10:08 PM, Ben Cohen <ben_co...@apple.com> wrote: > >> >> >> On Oct 24, 2017, at 6:48 PM, Xiaodi Wu <xiaodi...@gmail.com> wrote: >> >> On Tue, Oct 24, 2017 at 1:55 PM, Ben Cohen <ben_co...@apple.com> wrote: >> >>> >>> >>> On Oct 19, 2017, at 4:29 PM, Xiaodi Wu via swift-dev < >>> swift-dev@swift.org> wrote: >>> >>> Differing behavior in generic and concrete contexts is simply too subtle >>> to be understandable to the reader. >>> >>> >>> Hardly more subtle then the current “Equatable works like this, with >>> these strong guarantees. Oh, except for some cases it doesn’t, in which >>> case ¯\_(ツ)_/¯” >>> >> >> I'm not saying that the status quo is a superior alternative. >> >> However, one option is to _weaken_ the guarantees of Equatable such that >> it guarantees only partial equivalence for `==`. From the perspective of >> documented semantics, it's not subtle at all but a giant hammer of a >> change. However, from an actual what-does-the-implementation-do >> standpoint, it would be acknowledging what is already true. Only code that >> is already broken when used with floating-point values would become >> formally "incorrect" in the sense of relying on semantics that are then no >> longer guaranteed. >> >> Such a solution would avoid, as you might say, perpetuating the ¯\_(ツ)_/¯ >> approach to floating point. >> >> I realize that Comparable admits an exception for FP. This is, IMO, a >>> serious mistake and needs to be reversed. Equatable has no such exception >>> and rightly so. >>> >>> The clearest demonstrations of how flawed this approach is can be found >>> in the Standard Library. You can throw a brick at it and hit an example of >>> something that’s broken by the presence of .nan: random sort orders, == >>> implementations that differ based on the identify of the buffer, >>> >> >> In my view, if a sort algorithm cannot accommodate NaN, it's entirely >> acceptable to trap on NaN--and that is a trivial change. >> >> >> I think this would be user-hostile. This isn’t like out-of-bounds >> subscript where it’s just not possible to reasonably proceed. NaNs crop up >> and people don’t expect them to trap when you sort – they expect them to >> sort to one end, like in Excel. >> > > Honestly, I don't know that most users have thought about this possibility > at all. Sure, a sort that matches IEEE total order _might_ be justifiable. > But users are as likely to expect that the last item in the sorted > collection will be the greatest and that the first item in the sorted > collection will be the smallest. Now, you can say that NaN compares larger > than everything, everywhere. But the moment that they try to plug that last > element into, say, an AppKit UI function, they're toast. > > I certainly disagree with ideas of trapping on NaN inside `==` or similar > functions, but I really do think that an argument can be made that it is > not reasonable to proceed with sorting an array that contains NaN. > > >> After all, NaN is unordered with respect to everything and one cannot >> sort the unsortable. And, as shown, the `Array.==` implementation is >> trivially fixable. The entire standard library can be made NaN-safe in like >> manner. >> >> >> >> >> My point was, it’s not about what we can do in the standard library. The >> std lib only has a handful of methods and sure, we can fix them one by one. >> It’s about whether the standard library defines types and protocols such >> that it’s reasonable for programmers to use them to write and use generic >> algorithms correctly. I’m citing the existing std lib implementations as >> proof that it’s easy to make mistakes. And I think a more complicated >> approach, with more operators, more properties, more rules, won’t fix this >> problem. >> > > Well, to my mind, this problem you state really works out to: > > (a) People expect generic algorithms that operate on Comparable types to > work correctly with floating-point types > (b) Generic algorithms that operate on Comparable types don't work > correctly with floating-point types unless the author is very, very careful > (c) People shouldn't have to be very, very careful to write working > generic algorithms that work with floating-point types > > Which, in turn, really boils down to: > > (d) People expect floating-point types not to have numerous unintuitive > (but learnable) properties, including NaN being unordered > (e) Floating-point types have numerous unintuitive (but learnable) > properties, including NaN being unordered > > The reason I'm writing to swift-dev (rather than evolution) is that my > interest is in fixing the standard library. I'm not even convinced that > this problem you state is fixable, at least on your terms. In the interest > of not increasing the API surface area, you would propose to blow away (e) > in the generic but not concrete context. Now, while it's true that an > alternative to increasing the API surface area is to have the same API > exhibit context-specific behaviors, that certainly isn't any less > complicated conceptually, as we would then be faced with the notion that > floating-point types both have and do not have numerous unintuitive > properties, depending on the context in which they are used. > >> arbitrary duplication in Set/Dictionary etc. >>> >> >> (I disagree that it's arbitrary. If NaN != NaN, then every NaN is >> properly unique.) >> >> >>> The important point to take from this is not “how do we fix the Standard >>> Library?” but rather “these errors are easy to make” by anyone writing >>> generic code using standard protocols. If the Standard Library can’t get >>> these right, how can we expect others to? There are potentially far worse >>> bugs that could result. A differently-written sorting algorithm could >>> corrupt elements (because it relied on substitutability). Other sorting or >>> searching algorithms could easily go into an infinite loop. These problems >>> exist because the code relies on the documented behavior of the protocol, >>> because if you can’t, then what is the point in documenting that behavior? >>> >> >> It's not that the standard library *can't* get these right, but that it >> currently *doesn't*, because it documents one set of semantics but >> implements another, then relies on documented semantics that it knows it >> does not implement. We both agree that this needs to be fixed. >> >> The question here is whether it is to be fixed by sticking to the >> documented semantic guarantees of `==` and bringing all implementations >> into proper conformance, or alternatively sticking to the implemented >> behavior of `==` and aligning the documented semantic guarantees to that. >> >> >>> I don’t support solutions such as adding a property indicating >>> “containsExceptionalValues” (whatever that means), and expecting every >>> author of a generic algorithm that uses Equatable to remember to call it, >>> and craft custom paranoid behavior (if there is any reasonable behavior) >>> based on it. With recursive conformance landed on master, we finally have a >>> generics system where writing algorithms against Collection can be >>> considered approachable by ordinary users. You no longer have to know >>> things like how Collection.SubSequence needs to be constrained to also be a >>> Collection – it just is. We would be negating this good work to now >>> introduce a whole new set of gotchas that we expect people to know (without >>> the type system even helping them in this case) about how some types, >>> including standard library types, flout the documented rules for Equatable >>> and Comparable, and that you need to use one of a handful of properties to >>> hack in special cases to handle it. >>> >> >> The gotchas aren't new; they arise when using floating point values, >> originate with the IEEE definition of floating point equivalence, and exist >> in some form in every language that has implemented collections of floating >> point values. Crucially, they exist today in Swift; only, we haven't >> documented it. >> >> And as a user of algorithms, what should you do? If a generic algorithm >>> doesn’t document how it handles these special cases, should you assume it >>> doesn’t? Check the code? Experiment to find out? >>> >>> This problem also spreads, virus-like, once we have conditional >>> conformance that makes containers equatable when their elements are. >>> [Double] would need to propagate it’s elements’ “exceptionality", to >>> avoid problems with [Double]. Double? will have to do the same. >>> >> >> This isn't a _problem_. In fact, I consider this to be a key _feature_. >> Naturally, every protocol conformance (conditional or not) must implement >> all protocol requirements, so if we add additional requirements they must >> be implemented. What I'm saying here is that *it may be desirable* to have >> some protocol-based API to distinguish partial from full equivalence >> relations. If you accept that premise, then it is the logical consequence >> that if you conditionally conform `Array` to `Equatable`, you will have to >> implement any new APIs, and in so doing, document how equivalence of arrays >> of floating point values relates to floating point equivalence. For me, >> this is a _good thing_: it documents _in code_ something that today is >> muddled through. >> >> The explanation that a method on `Float` is a "floating-point context" >>> but a method on `[Float]` is *not a "floating point context"* is, IMO, >>> indefensible. >>> >>> >>> Nevertheless, I will attempt to defend it :) >>> >>> I find it odd that violating the documented requirements of a protocol >>> is considered defensible, but expecting types comply with those >>> requirements is indefensible. A principled stance would be to say that >>> Float shouldn’t conform to Equatable (because… it doesn’t!) and requiring >>> all calls to supply a predicate (and maybe amending types like Dictionary >>> to allow you to supply one). That won’t fly though – users would complain – >>> so instead we are in this murky ground. >>> >> >> I don't think we should defend violating the documented requirements of a >> protocol. Either (a) Float should not conform to Equatable (agree, this is >> a non-starter); (b) how Float conforms to Equatable should be brought into >> conformance with documented semantics (your stance); or (c) what semantics >> are documented should be brought into alignment with how conformance is >> actually implemented (my stance). Naturally, in the last case, additional >> APIs should be added as needed to make such reduced semantic guarantees >> useful for generic algorithms. >> >> Later in the thread, you mention a possible fix for sort: >>> >>> `sort()` is problematic, but not if a custom predicate is supplied. >>> >>> >>> So, we are potentially trading off one subtlety (that < behaves >>> differently in generic and non-generic contexts) for another (that you need >>> to know that you need to pass in a special predicate for sorting, or you >>> get nonsense results). Knowing when an algorithm requires you to supply a >>> predicate (like sort) vs when handling for the special case is built in >>> (like equatable) seems far worse complication to me than knowing one rule: >>> that generically when constrained to Comparable, Float adheres to the >>> requirements of Comparable. Always. That is a consistent rule that you need >>> to learn once and that doesn’t vary depending on which algorithm you’re >>> using. >>> >> >> I would argue that Float should _always_ adhere to the requirements of >> Comparable, in all contexts. The question is, rather: what can be the >> requirements of Comparable such that Float can always adhere to them? >> >> Another alternative proposed in previous threads is to give Comparable an >>> additional operator (<=> or .compare(to:) that will always enforce a total >>> ordering, and sort can use that. This is, afaict, C#’s solution – >>> double.NaN < 1.0, 1.0 < double.NaN and double.NaN == double.NaN all return >>> false, but Comparer<double>.Default.compare returns -1, 1 and 0 >>> respectively. >>> >> >> This is, essentially, the endpoint of what I'm proposing. >> >> Equatable would vend (modulo bikeshedding): >> `==`, a partial equivalence relation >> `~`, a full equivalence relation >> `containsExceptionalValues` (yes, this is a deliberately terrible name, >> because it's meant to go through bikeshedding), a Boolean value to indicate >> whether `==` is the same as `~` >> >> Comparable would vend (modulo bikeshedding): >> `<`, `>`, <=`, `>=`, defined as now >> `<=>`, as in C# `compare` (or maybe, to emphasize the point, `<~>`) >> `containsExceptionalValues`, inherited from `Equatable`, to document the >> relationship between `<` (etc.) and the spaceship operator >> >> >> >> This looks to me to be an absurd mess of operations, none of which will >> have much hope of being used in a coherent fashion by most people. Should I >> use == or ~ here? What are the rules again? Will people remember to not use >> < when they really need <=>? Probably not. Did the author of this framework >> I’m using remember? Dunno. >> > > The syntax here is not the point (or if it is, it can be bikeshedded). The > point I'm trying to make is that what you're criticizing as _incoherent_ is > also _inescapable_. Floating-point types have a notion of equivalence that > isn't full equivalence. For certain use cases (both concrete and generic), > we want that partial equivalence, while for other use cases (both concrete > and generic), we truly want full equivalence. To work with floating-point > types correctly, a user must know that there is a difference between the > two. If there is no hope of "most people" understanding this distinction > when one relation is named `==` and the other is named `~`, then _a > fortiori_ there is no hope of "most people" understanding the distinction > when they're conflated into one operator `==` that has different behaviors > in different contexts. > > The C# model of compare works because < is not available generically. >> There is no choice between < and <=>, and so the model is simple and easily >> understood by both algorithm implementors and users. And if you need a >> different ordering, you can supply your own custom comparator. As far as I >> can tell, it’s a good model and users are happy with it. Swift is >> different, since the concrete < *is*exposed to the generic >> implementation, but having two possibilities and expecting users to pick is >> IMO a bad idea. Hence the proposed fix that Float’s Comparable.< is >> required to be a total order, per the requirements of Comparable, >> essentially giving us the C# model. >> > > A true C# model would be fine, but the key point of that model to my mind > is that partial equivalence and full equivalence are spelled differently > (that is, `==` and `Equals`, respectively). It would not work with IEEE > `==` being spelled the same way as Comparable `==`. If we were to rename > the IEEE operation `&==` instead, then we'd functionally have a design > that's broadly similar to the earlier version, only with different names: > > Equatable would vend `==`, a full equivalence relation (and `!=`) > Comparable would vend `<`, `>`, `<=`, `>=`, now operators that reflect a > total order over the set of all values; and maybe `<=>` > Floating point would additionally vend `&==` and `&<` (and `&!=`, `&<`, > `&>`, `&<=`, `&>=`) > > One key difference here would be that the partial equivalence relation > would now only be found on floating-point types, and it would not be > possible to write a generic algorithm that operates on any partially > equatable or equatable type. But the other--and major--issues would be (a) > that all concrete uses of floating-point comparison operators would have to > be migrated to append an extra `&`; and (b) this syntax suggests that most > users want to use `==` *instead of* `&==`, which I'm not sure is the > case--and certainly isn't the case if they're trying to do the same things > they're used to doing with floating-point values in other languages. > > > What about having the protocol hierarchy look like this? (All names > subject to bikeshedding, of course) > Please see earlier replies summarizing core team members' persuasive arguments as to why not multiple protocol hierarchies like this. protocol MaybeEquatable { > static func ?== (lhs: Self, rhs: Self) -> Bool? > } > protocol MostlyEquatable : MaybeEquatable { > static func == (lhs: Self, rhs: Self) -> Bool > } > extension MostlyEquatable { > static func ?== (lhs: Self, rhs: Self) -> Bool? { return lhs == rhs } // > allows a `MostlyEquatable` or `Equatable` to function as a `MaybeEquatable` > without any extra code > } > protocol Equatable : MostlyEquatable {} // purely a semantic difference, > no extra syntax > > protocol MaybeComparable : MaybeEquatable { > static func ?< (lhs: Self, rhs: Self) -> Bool? > // plus the rest of them > } > protocol MostlyComparable : MaybeComparable, MostlyEquatable { > static func < (lhs: Self, rhs: Self) -> Bool > // plus the rest of them > } > extension MostlyComparable { > static func ?< (lhs: Self, rhs: Self) -> Bool? { return lhs < rhs } // > allows a `MostlyComparable` or `Comparable` to function as a > `MaybeComparable` without any extra code > // plus the rest of them > } > protocol Comparable : MostlyComparable, Equatable {} // purely a semantic > difference, no extra syntax > > extension Double : MostlyComparable { > static func ?== (lhs: Double, rhs: Double) -> Bool? { > return lhs.isNaN || rhs.isNaN ? nil : lhs == rhs > } > static func ?< (lhs: Double, rhs: Double) -> Bool? { > return lhs.isNaN || rhs.isNaN || (lhs.isInfinite == true && rhs.isInfinite > == true && lhs.sign == rhs.sign) ? nil : lhs < rhs > } > static func == (lhs: Double, rhs: Double) -> Bool { > // whatever current impl is > } > static func < (lhs: Double, rhs: Double) -> Bool { > // whatever current impl is > } > } > > > This would let people easily switch between the two kinds of "correct" > generic comparisons (returning a `Bool?` for types that could have invalid > comparisons and a `Bool` for types that can't), as well as easily opting > into using IEEE-754's current "arguably incompatible with sane generic > programming" behavior (by constraining T to "MostlyComparable" instead of > "Comparable") if that's what they really want. > > `Collection` could have different implementations of `==` depending on > whether `Element` conforms to "MaybeEquatable" or > "MostlyEquatable/Equatable", solving the "a = [Double.nan]; print(a == a) > // prints true" issue. > > This doesn't involve trapping on anything, so dunno what the performance > implications would be. All the extra work would be contained in the "?*" > functions, though, so at least existing code wouldn't sudden get slower. > > - Dave Sweeris >
_______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev