#20246: Use `with strict_equality(True)` to work around hashing of p-adics
-------------------------------------+-------------------------------------
       Reporter:  saraedum           |        Owner:
           Type:  enhancement        |       Status:  needs_work
       Priority:  major              |    Milestone:  sage-7.2
      Component:  padics             |   Resolution:
       Keywords:  days71             |    Merged in:
        Authors:  Julian RĂ¼th        |    Reviewers:
Report Upstream:  N/A                |  Work issues:
         Branch:                     |       Commit:
  
u/saraedum/use__with_strict_equality_true___to_work_around_hashing_of_p_adics|  
f4c5a9db729ccf911517039fdced282588210133
   Dependencies:  #16342, #16339     |     Stopgaps:
-------------------------------------+-------------------------------------

Comment (by nbruin):

 Replying to [comment:24 saraedum]:
 > However, most parents get used as cache keys in our factories and
 `UniqueRepresentation`. And parents may depend on p-adic numbers in some
 way.

 Parents that are used as cache-keys are usually not a problem: they are
 usually `EqualityByID`, so they can basically hash by their ID. If p-adics
 are part of the construction parameters (e.g., a p-adic extension field,
 perhaps?) of a `unique` parent structure, a little more is needed, of
 course. There either we would need `cache_key` or we'd need to transform
 the construction parameters into something that can be hashed. For
 extension fields, that's actually not such an unreasonable thing to do:
 fields defined by a minimal polynomial that is only known to limited
 precision is a real pain. It's much more convenient to lift it to an exact
 polynomial.

 In any case, I would be very surprised if there are any structures out
 there that depend on anything else than sequences, polynomials, and
 matrices. If those have cache_key strategies, we'd be OK.

 > No but a lot of our code assumes that everything is usable as a key in a
 dictionary for the sake of caching. I believe that this is a reasonable
 assumption: Running a method on indistinguishable inputs should produce
 the same output. And in some cases it should even produce the exact same
 object, for example when creating parents; in those cases you can not do
 without caching.

 But I am pretty sure this is limited to polynomials and matrices.

 > I am not sure what you mean with "high up". For correctness you have to
 implement it in any class that gets instantiated with a parameter that may
 depend (in some way) on something unhashable, because essentially you just
 need to write this as a method that produces the constructor parameters
 that serve to produce an indistinguishable element.

 I meant: you could start sticking implementations into the more generic
 classes for polynomials and matrices. I don't think we need to define
 "correctness" as "p-adic numbers are hashable".

 > Though I originally implemented `_cache_key` what I dislike about it is
 that you just have to add silly boilerplate to many classes. The new
 approach is much better in that sense. The changes are where the actual
 problem arises, namely in the classes that can not define a usual
 `__hash__`. For everybody else it just works and it is actually correct.
 The current `_cache_key` is calling for trouble in the long run: People
 add a constructor parameter but don't add it to `_cache_key`; the doctests
 pass but caching eventually produces incorrect results.

 Most parents do not need a `_cache_key`, because they are actually
 hashable thanks to `EqualityByID`. I'd expect that only some element
 classes need a `_cache_key`. So the scenario you sketch would only play
 out if the identity-determining components of an element change.

 > Of course, if the end user decides to run all its code `with
 strict_equality(True)` then many things break. But the docstring tries to
 be clear about this.

 The fundamental point is that you make equality context dependent. It
 means you now need to know global state in order to reason about code that
 otherwise looks completely local. I think that's an incredibly big
 complexification and a major step back in the understandability and
 maintainability of sage code. Whatever you write in the docstring doesn't
 change that fact. You can't fix bad design by good documentation.

 > I think the "strategic places" is a problem here. What this says it that
 using p-adics (or fraction field elements) just works in places where
 there happened to be a doctest that implied p-adics (and usually extension
 fields to actually trigger `__hash__`.) Why not make it work everywhere?

 I think you are too pessimistic about the multitude of ways in which padic
 numbers can end up being hashed in legitimate ways. I think you can get
 this covered pretty quickly (polys and matrices). Plus: a lot of places
 where p-adics aren't initially foreseen will actually break with p-adics,
 because they are imprecise objects. That usually breaks algorithms that
 weren't built with that in mind.

 re. why not everywhere: because the proposed solution (making equality
 testing depend on global state) incurs too high a price on the
 maintainability of sage.

 > Yes, global state is bad. The implementation is very clear on that and I
 can elaborate on this further in the docstrings.

 Being clear about it doesn't fix a bad design decision (an, I think,
 justified technical qualification of the proposal which in no way reflects
 on the proposers)

 I'm sympathetic about the problem. The issue is really that equality tests
 for p-adics are probably bad in all cases. We're finding two
 interpretations that sort-of work in some circumstances:
  - difference is indistinguishable from 0: if you start out with
 sufficient starting precision, you can sort-of run "precise" algorithms
 and get a more-or-less correct answer
  - equal only if numbers are functionally the same: the safer notion for
 hashing.

 Option 1 at least allows us to do naive linear algebra on small systems,
 with a ridiculous amount of excess precision. (in a way that works better
 than with floats, say)

 Option 2 is required to sort-of properly handle routines that assume
 hashability.

 We can't have both. Option 1 is what people expect when they start working
 with p-adics initially. For option 2 we have a work-around in the form of
 `cache_key` (which I thought was a stroke of brilliance!). So I think the
 best way forward is to try and make `cache_key` work.

 I agree it would be nicer to have another solution, but we're dealing with
 two different concepts of equality here, and we can't cram them into the
 same interface.

--
Ticket URL: <http://trac.sagemath.org/ticket/20246#comment:25>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.

Reply via email to