[perl #129019] [BUG] Range.WHICH fails on many kinds of endpoints

Brian S. Julin via RT Wed, 13 Sep 2017 12:33:14 -0700

On Wed, 13 Sep 2017 10:29:05 -0700, zef...@fysh.org wrote:
> Brian S. Julin via RT wrote:
> > it would be OK for there to be some tiny chance
> > of a collision between two WHICH.Str's as long as the actual WHICHs
> > do not collide.
> 
> One could make that distinction, but then the .Str of the .WHICH would
> not fulfill the purposes for which .WHICH is used, and would seem
> pretty
> pointless.


I don't think so... and also in answer to:

> Key question: what value does a colliding .WHICH.Str add?

For example it provides a way to tell with very high probability
whether the two 500,000 element sets which only differ in one element,
or in sort order are the same set, if you are not in a position to just
=== them, by comparing only two lines of output... and such comparisons
during debug are pretty much the most common way it is used, though
usually with less problematic non-value types.

It can also, if well constructed, tell you whether an unseeded hash
would put the two in the same bucket, though that is less useful.

> Consider the Set class, as an example user of .WHICH.  It needs something
> that it can easily hash and compare, and whose equality corresponds
> precisely to object identity.

I think for initial hash generation that pretty much boils down (recursively)
to its serialization/.freeze format for value types, and for subsequent hash
caching, if implemented, there is a case to be made that that cache should
be stored in the .WHICH but not considered a hashable part of the WHICH,
so you can just take it and use it if it has already been calculated.
Or at least you should be able to look up hashes in a table of WHICHs for
which caching has been performed.

>  That doesn't have to be a string, but a
> string is a convenient format for that.  So OK, if .WHICH.Str doesn't
> do
> the job then Set can go to extra effort to use the `real' non-
> colliding
> .WHICH value.  In this context, a colliding .WHICH.Str is worse than
> useless: it's an attractive nuisance, because it'll function well
> enough
> as a .WHICH substitute to pass most test suites but then fail in real
> use.

I'd hardly rate "being wrong once in a lifetime" at a "nuisance" level
of utility.

Given that WHICH's implementation is purposefully unspecified so it can be
implementation-specific, any tests in roast looking at WHICH.Str or dissecting 
.WHICH
values should probably be suspect: you compare .WHICH values of two things
that should or should not be different, and that's about it.  Now, if rakudo
wants to test the contents of .WHICH or .WHICH.Str in its test suite, that's
up to rakudo.

> Even if there were some demand for yet another stringification method,
> .WHICH.Str would be the wrong place to put it.  Half-hearted
> inspection
> is entirely contrary to the basic concept of .WHICH.  If such a method
> is to be added, it should be a method directly on the principal
> object.
> .WHICH doesn't have to supply a string directly, or even just a string
> wrapped in a funny class, but the object it supplies should be
> concerned
> entirely with the precise identity of the principal object.  For the
> .WHICH value to stringify to anything that doesn't have the same
> identity
> properties would be misleading.

> If you're interested in a human inspecting the .WHICH value itself,
> rather than inspecting the principal object, then the most important
> method to consider is .WHICH.perl.  By the intent of .perl, this ought
> to produce a string that fully represents the actual .WHICH value.
> Ellipsis is not useful here.  It would be acceptable for .WHICH.gist
> to provide a lossy representation, but .gist is so loosely defined
> that
> almost anything is acceptable.

We agree on WHICH.perl and WHICH.gist, and I'm happy to let others
argue over which of those WHICH.Str would spit out.

> > there's room left open for not requiring WHICH implementation at all
> > on value types).
> > For value types, .WHICH could very well be just identity
> 
> That too would break Set and anything else that needs a way to hash
> object identity.

No not really.  It's just that the spec specifies that .WHICH is to
be used for hashing, and does not specify how that hashing is to
occur.  There's no specified API presented uniformly by everything
that comes out of a WHICH, and this seems to be intentional.

> A .WHICH method producing a consistent type of output is useful on *all* 
> types.

I think it is a double-edged sword; It can be useful but has its perils.
It certainly *can* be done this way...

...or your implementation could demand all .WHICH produce an object with a
.HASH-ME-BABYCAKES method.  Or just demand all value types and ObjAt have such
a method and have WHICH be a no-op on value types.  Or add a dash of 
canonicalism
to .freeze, which you need to implement anyway, and an adverb to gen a hash.
Or it could simply demand a type-specific === multi candidate is present 
in-scope
and that object-keyed hashes are a transparent layered construct mapping types
to subhashes keyed by that specific type, and let that type define how those
subhashes work.

There are various ways to skin that cat and unless someone comes up with a
compelling reason to do it one particular way, I think this has been left
open for "laboratory of democracy" purposes.  I'm not finding the case for
Str being a common go-to format very compelling so far, personally.

Anyway, I don't want to derail this or your other tickets since the .WHICH
values themselves are definitely dysfunctional no matter how it they stringify.
I'd say that's more important to fix.

[perl #129019] [BUG] Range.WHICH fails on many kinds of endpoints

Reply via email to