[swift-evolution] [pitch] Comparison Reform

Ben Cohen via swift-evolution Thu, 13 Apr 2017 13:17:18 -0700

Hi swift evolution,

Here’s another pitch, for The Propoosal Formerly Known As Spaceship.
Comparison Reform


Proposal: SE-NNNN 
<file:///Users/ben_cohen/Documents/swift-evolution/proposals/NNNN-filename.md>
Authors: Robert Widmann <https://github.com/codafi>, Jaden Geller 
<https://github.com/jadengeller>, Harlan Haskins 
<https://github.com/harlanhaskins>, Alexis Beingessner 
<https://github.com/Gankro>, Ben Cohen <https://github.com/airspeedswift>
Status: Awaiting review
Review manager: TBD
Introduction

This proposal is for changes that we believe should be made to the existing 
comparison system by:

Making FloatingPoint comparison context sensitive, so that its Comparable 
conformance provides a proper total ordering.
Introducing a new ternary-valued compare(_ other: Self) -> ComparisonResult 
method.
Removing unnecessary customization points from Comparable.
Motivation

The motivation comes from several independent points:

1: The standard comparison operators have an intuitive meaning to programmers. 
Swift encourages encoding that in an implementation of Comparable that respects 
the rules of a total order <https://en.wikipedia.org/wiki/Total_order>. The 
standard library takes advantage of these rules to provide consistent 
implementations for sorting and searching generic collections of Comparable 
types.

Not all types behave so well in this framework, unfortunately. There are cases 
where the semantics of a total order cannot apply while still maintaining the 
traditional definition of “comparison” for these types. Take, for example, 
sorting an array of Floats. Today, Float’s instance of Comparable follows 
IEEE-754 and returns false for all comparisons of NaN. In order to sort this 
array, NaN s are considered outside the domain of <, and the order of a sorted 
array containing them is unspecified. Similarly, a Dictionary keyed off floats 
can leak entries and memory.

2: Generic algorithms in the Swift Standard Library that make use of the 
current Comparable protocol may have to make extra comparisons to determine the 
ordering of values when <, ==, and > should have different behaviours. Having a 
central operation to return complete ordering information should provide a 
speedup for these operations.

3: The existing comparison operators don’t “generalize” well. There’s no clean 
way to add a third or fourth argument to < to ask for non-default semantics. An 
example where this would be desirable would be specifying the locale or 
case-sensitivity when comparing Strings.

4: Comparable is over-engineered in the customization points it provides: to 
our knowledge, there’s no good reason to ever override >=, >, or <=. Each 
customization point bloats vtables and mandates additional dynamic dispatch.

5: When quickly writing a Comparable type, it is easier to implement a single 
ternary statement than to separately implement == and <.

Proposed solution

ComparisonResult

Foundation’s ComparisonResult type will be mapped into Swift as

@objc public enum ComparisonResult : Int {
  case orderedAscending = -1
  case orderedSame = 0
  case orderedDescending = 1
}
Comparable

Comparable will be changed to have a new ternary comparison method: compare(_ 
other: Self) -> ComparisonResult. x.compare(y) specifies where to place x 
relative to y. So if it yields .orderedAscending, then x comes before y. This 
will be considered the new “main” dispatch point of Comparable that 
implementors should provide.

Most code will continue to use < or ==, as it will be optimal for their 
purposes. However code that needs to make a three-way branch on comparison can 
use the potentially more efficient compare. Note that compare is only expected 
to be more efficient in this specific case. If a two-way branch is all that’s 
being done, < will be more efficient in many cases (if only because it’s easier 
for the optimizer).

For backwards compatibility reasons, compare will have a default implementation 
defined in terms of <, but to enable only using compare, < and == will also 
have default implementations in terms of compare.

The compiler will verify that either compare, or < and ==, are provided by 
every type that claims to conform to Comparable. This will be done in some 
unspecified way unavailable outside the standard library (it can be made 
available to in the future, but that’s an unnecessary distraction for this 
proposal).

Types that wish to provide comparison “variants” can do so naturally by adding 
compare methods with additional arguments. e.g. String.compare(_ other: Self, 
in: Locale) -> ComparisonResult. These have no language-level connection to 
Comparable, but are still syntactically connected, implying the same total 
order semantics. This makes them easier to discover, learn, and migrate to.

To reduce bloat, the operators <=, >=, and > will be removed from the set of 
requirements that the Comparable protocol declares. These operators will 
however continue to exist with the current default implementations.

FloatingPoint

No changes will be made to the FloatingPoint protocol itself. Instead, new 
extensions will be added to it to change the behaviour of comparison.

The new behaviour centers around the fact that compare(_: Self) -> 
ComparisonResult will provide a total ordering that’s consistent with Level 2 
in the IEEE 754 (2008) spec. This is mostly the same as the standard (Level 1) 
IEEE ordering, except:

-0 < +0
NaN == NaN
NaN > +Inf (an arbitrary choice, NaN can be placed anywhere in the number line)
Level 2’s distinguishing of -0 and +0 is a bit strange, but is consistent with 
Equatable’s Substitutability requirement. -0 and +0 have different behaviours: 
1/-0 = -Inf while 1/+0 = +Inf. The main problem this can lead to is that a 
keyed collection may have two “0” entries. In practice this probably won’t be a 
problem because it’s fairly difficult for the same algorithm to produce both -0 
and +0. Any algorithm that does is also probably concerned with the fact that 
1.0E-128 and 2.0E-128 are considered distinct values.

Note: IEEE specifies several other potential total orderings: level 3, level 4, 
and the totalOrder predicate. For our purposes, these orderings are too 
aggressive in distinguishing values that are semantically equivalent in Swift. 
For most cases, the relevant issue is that they distinguish different encodings 
of NaN. For more exotic encodings that don’t guarantee normalization, these 
predicates also consider 10.0e0 < 1.0e1 to be true. An example where this can 
occur is IEEE-754 decimal coded floating point, which FloatingPoint is intended 
to support.

We will then make the comparison operators (<, <=, ==, !=, >=, >) dispatch to 
one of compare(_:) or FloatingPoint’s IEEE comparison methods (isLess, isEqual, 
isLessThanOrEqualTo) based on the context.

If the context knows the type is FloatingPoint, then level 1 ordering will be 
used.
If the context only knows the type is Comparable or Equatable, then level 2 
ordering will be used.
This results in code that is explicitly designed to work with FloatingPoint 
types getting the expected IEEE behaviour, while code that is only designed to 
work with Comparable types (e.g. sort and Dictionary) gets more reasonable 
total ordering behaviour.

To clarify: Dictionary and sort won’t somehow detect that they’re being used 
with FloatingPoint types and use level 1 comparisons. Instead they will 
unconditional use level 2 behaviour. For example:

let nan = 0.0/0.0

func printEqual<T: Equatable>(_ x: T, _ y: T) {
  print(x == y)
}

func printEqualFloats<T: FloatingPoint>(_ x: T, _ y: T) {
  print(x == y)
}

print(nan == nan)          // false, (concrete)
printEqual(nan, nan)       // true,  (generic Equatable but not FloatingPoint)
printEqualFloats(nan, nan) // false, (generic FloatingPoint)
If one wishes to have a method that works with all Equatable/Comparable types, 
but uses level 1 semantics for FloatingPoint types, then they can simply 
provide two identical implementations that differ only in the bounds:

let nan = 0.0/0.0

func printEqual<T: Equatable>(_ x: T, _ y: T) {
  print(x == y)
}

func printEqual<T: FloatingPoint>(_ x: T, _ y: T) {
  print(x == y)
}

printEqual(0, 0)           // true (integers use `<T: Equatable>` overload)
printEqual(nan, nan)       // false (floats use `<T: FloatingPoint>` overload)
As a result of this change, hashing of floats must be updated to make all NaNs 
hash equally. -0 and +0 will also no longer be expected to hash equally. 
(Although they might as an implementation detail – equal values must hash the 
same, unequal values may hash the same.)

Misc Standard Library

Types that conform to Comparable should be audited for places where 
implementing or using Comparable would be a win. This update can be done 
incrementally, as the only potential impact should be performance. As an 
example, a default implementation of compare(_:) for Array will likely be 
suboptimal, performing two linear scans to determine the result in the 
worst-case. (See the default implementation provided in the detailed design.)

Some free functions will have <T: FloatingPoint> overloads to better align with 
IEEE-754 semantics. This will be addressed in a follow-up proposal. (example: 
min and max)

Detailed Design

The protocols will be changed as follows:

ComparisonResult, currently a type found in Foundation, will be sunk into the 
Swift Standard Library:

@objc public enum ComparisonResult: Int, Equatable {
  case orderedAscending = -1
  case orderedSame = 0
  case orderedDescending = 1
}

public protocol Comparable: Equatable {
  func compare(_ other: Self) -> ComparisonResult

  static func < (lhs: Self, rhs: Self) -> Bool
}

extension Comparable {
  func compare(_ other: Self) -> ComparisonResult {
    if self == other {
      return .orderedSame
    } else if self < other {
      return .orderedAscending
    } else {
      return .orderedDescending
    }
  }
}

public func < <T: Comparable>(lhs: T, rhs: T) -> Bool {
  return lhs.compare(rhs) == .orderedAscending
}

// IEEE comparison operators (these implementations already exist in std)
extension FloatingPoint {
  public static func == (lhs: T, rhs: T) -> Bool {
    return lhs.isEqual(to: rhs)
  }

  public static func < (lhs: T, rhs: T) -> Bool {
    return lhs.isLess(than: rhs)
  }

  public static func <= (lhs: T, rhs: T) -> Bool {
    return lhs.isLessThanOrEqualTo(rhs)
  }

  public static func > (lhs: T, rhs: T) -> Bool {
    return rhs.isLess(than: lhs)
  }

  public static func >= (lhs: T, rhs: T) -> Bool {
    return rhs.isLessThanOrEqualTo(lhs)
  }
}


// Comparable comparison operators (provides a total ordering)
extension FloatingPoint {
  @_inline
  public func compare(_ other: Self) -> ComparisonResult {
    // Can potentially be implemented more efficiently -- this is just the 
clearest version
    if self.isLess(than: other) {
      return .orderedAscending
    } else if other.isLess(than: self) {
      return .orderedDescending
    } else {
      // Special cases

      // -0 < +0
      if self.isZero && other.isZero {
        // .plus == 0 and .minus == 1, so flip ordering to get - < +
        return (other.sign as Int).compare(self.sign as Int)
      }

      // NaN == NaN, NaN > +Inf
      if self.isNaN {
        if other.isNaN {
          return .orderedSame
        } else {
          return .orderedDescending
        }
      } else if other.isNaN {
        return .orderedAscending
      } 

      // Otherwise equality agrees with normal IEEE
      return .orderedSame
    }
  }

  @_implements(Equatable.==)
  public static func _comparableEqual(lhs: Self, rhs: Self) -> Bool {
    lhs.compare(rhs) == .orderedSame
  }

  @_implements(Comparable.<)
  public static func _comparableLessThan(lhs: Self, rhs: Self) -> Bool {
    lhs.compare(rhs) == .orderedDescending
  }
}
Note that this design mandates changes to the compiler:

@_implements (or an equivalent mechanism) must be implemented to get the 
context-sensitive FloatingPoint behaviour.
The compiler must verify that either == and <, or compare(_:) is overridden by 
every type that conforms to Comparable.
Source compatibility

Users of ComparisonResult will be able to use it as normal once it becomes a 
standard library type.

Existing implementors of Comparable will be unaffected, though they should 
consider implementing the new compare method as the default implementation may 
be suboptimal.

Consumers of Comparable will be unaffected, though they should consider calling 
the compare method if it offers a performance advantage for their particular 
algorithm.

Existing implementors of FloatingPoint should be unaffected – they will 
automatically get the new behaviour as long as they aren’t manually 
implementing the requirements of Equatable/Comparable.

Existing code that works with floats may break if it’s relying on some code 
bounded on <T: Equatable/Comparable>providing IEEE semantics. For most 
algorithms, NaNs would essentially lead to unspecified behaviour, so the 
primary concern is whether -0.0 == +0.0 matters.

ABI stability

This must be implemented before ABI stability is declared.

Effect on API resilience

N/A

Alternatives Considered

Spaceship

Early versions of this proposal aimed to instead provide a <=> operator in 
place of compare. The only reason we moved away from this was that it didn’t 
solve the problem that comparison didn’t generalize.

Spaceship as an operator has a two concrete benefits over compare today:

It can be passed to a higher-order function
Tuples can implement it
In our opinion, these aren’t serious problems, especially in the long term.

Passing <=> as a higher order function basically allows types that aren’t 
Comparable, but do provide <=>, to be very ergonomically handled by algorithms 
which take an optional ordering function. Types which provide the comparable 
operators but don’t conform to Comparable are only pervasive due to the absence 
of conditional conformance. We shouldn’t be designing our APIs around the 
assumption that conditional conformance doesn’t exist.

When conditional conformance is implemented, the only 
should-be-comparable-but-aren’t types that will remain are tuples, which we 
should potentially have the compiler synthesize conformances for.

Similarly, it should one day be possible to extend tuples, although this is a 
more “far future” matter. Until then, the (T, T) -> Bool predicate will always 
also be available, and < can be used there with the only downside being a 
potential performance hit.

Just Leave Floats Alone

The fact that sorting floats leads to a mess, and storing floats can lead to 
memory leaks and data loss isn’t acceptable.

Just Make Floats Only Have A Total Order

This was deemed too surprising for anyone familiar with floats from any other 
language. It would also probably break a lot more code than this change will.

Just Make Floats Not Comparable

Although floats are more subtle than integers, having places where integers 
work but floats don’t is a poor state of affairs. One should be able to sort an 
array of floats and use floats as keys in data structures, even if the latter 
is difficult to do correctly.

PartialComparable

PartialComparable would essentially just be Comparable without any stated 
ordering requirements, that Comparable extends to provide ordering 
requirements. This would be a protocol that standard IEEE comparison could 
satisfy, but in the absence of total ordering requirements, PartialComparable 
is effectively useless. Either everyone would consume PartialComparable (to 
accept floats) or Comparable (to have reasonable behaviour).

The Rust community adopted this strategy to little benefit. The Rust libs team 
has frequently considered removing the distinction, but hasn’t because doing it 
backwards compatibly would be complicated. Also because merging the two would 
just lead to the problems Swift has today.

Different Names For compare and ComparisonResult

A few different variants for ComparisonResult and its variants were considered:

Dropping the ordered part of ComparisonResult’s cases e.g. .ascending
Naming of ComparisonResult as SortOrder
enum Ordering { case less, equal, greater } (as used by Rust 
<https://doc.rust-lang.org/std/cmp/enum.Ordering.html>)
Case values of inOrder, same, outOfOrder
The choice of case names is non-trivial because the enum shows up in different 
contexts where different names makes more sense. Effectively, one needs to keep 
in mind that the “default” sort order is ascending to map between the concept 
of “before” and “less”.

The before/after naming to provide the most intuitive model for custom sorts – 
referring to ascending or less is confusing when trying to implement a 
descending ordering. Similarly the inOrder/outOfOrder naming was too indirect – 
it’s more natural to just say where to put the element. If the enum should 
focus on the sorting case, calling it SortOrder would help to emphasize this 
fact.

This proposal elects to leave the existing Foundation name in-place. The 
primary motivation for this is that use of the compare function will be 
relatively rare. It is expected that in most cases users will continue to make 
use of == or <, returning boolean values (the main exception to this will be in 
use of the parameterized String comparisons). As such, the source compatibility 
consequences of introducing naming changes to an existing type seems of 
insufficient benefit.

The method compare(_:) does not fully comport with the API naming guidelines. 
However, it is firmly established with current usage in Objective-C APIs, will 
be fairly rarely seen/used (users will usually prefer <, == etc), and 
alternatives considered, for example compared(to:), were not a significant 
improvement.

Add Overloads for (T, T) -> ComparisonResult

It would be slightly more ergonomic to work with ComparisonResult if existing 
methods that took an ordering predicate also had an overload for (T, T) -> 
ComparisonResult. As it stands, a case-insensitive sort must be written as 
follows:

myStrings.sort { $0.compare(_ other: $1, case: .insensitive) == 
.orderedAscending }
With the overload, one could write:

myStrings.sort { $0.compare($1, case: .insensitive) }
we decided against providing these overloads because:

The existing algorithms in the standard library can’t benefit from them (only 
binary comparisons).
They bloat up the standard library (and any library which intends to match our 
API guidelines).
They potentially introduce confusion over “which” comparison overload to use.
And because we can change our mind later without concern for source or ABI 
stability, as these overloads would be additive.

Future Work

This section covers some topics which were briefly considered, but were 
identified as reasonable and possible to defer to future releases. Specifically 
they should be backwards compatible to introduce even after ABI stability. Two 
paths that are worth exploring:

Ergonomic Generalized Comparison for Keyed Containers

Can we make it ergonomic to use an (arbitrary) alternative comparison strategy 
for a Dictionary or a BinaryTree? Should they be type-level Comparators, or 
should those types always store a (Key, Key) -> ComparisonResult closure?

We can avoid answering this question because Dictionary is expected to keep a 
relatively opaque (resilient) ABI for the foreseeable future, as many 
interesting optimizations will change its internal layout. Although if the 
answer is type-level, then Default Generic Parameters must be accepted to 
proceed down this path.

ComparisonResult Conveniences

There are a few conveniences we could consider providing to make 
ComparisonResult more ergonomic to manipulate. Such as:

// A way to combine orderings
func ComparisonResult.breakingTiesWith(_ order: () -> ComparisonResult) -> 
ComparisonResult

array.sort {
  $0.x.compare($0.y)
  .breakingTiesWith { $0.y.compare($1.y) }
  == .orderedAscending 
}
and

var inverted: ComparisonResult

// A perhaps more "clear" way to express reversing order than `y.compared(to: 
x)`
x.compare(y).inverted
But these can all be added later once everyone has had a chance to use them.

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

[swift-evolution] [pitch] Comparison Reform

Reply via email to