Re: Valid hash keys?
Alex Burr writes: > On Sun, Feb 27, 2005 at 03:36:42PM -0700, Luke Palmer wrote: > > But the biggest problem is that if the user overloads 'equal' on two > > objects, the hash should consider them equal. We could require that to > > overload 'equal', you also have to overload .hash so that you've given > > some thought to the thing. The worry I have is that people will do: > > > > method hash() { 0 } > > This is why I think a 'canonicalize' method is better than a .hash' > method: if canonicalize is defined correctly, then .equal is just > {$a.canon() is_bitwise_identical_to $b.canon()}, so people don't have > to supply both. Sure. And you know what's better than all of that? Transitive ordering on all objects, so that .equal is just: !($a < $b && $b < $a) But the reason we're not going with that is because it's very hard (both from a user perspective and computationally) to do that in the general case. And the same is true for canonicalization. Sure, when you have certain kinds of objects in mind, like strings, or trees, it's very easy to canonicalize. But when you have other objects in mind, like filehandles, graphs, or references to remote objects, it's hard or impossible to canonicalize. And in fact, there are some objects for which it's hard or impossible to define an .equal method. And those objects can't be hashed, plain and simple. If you need to hash them, you can define a type of hash that uses =:= as comparison, so that things are only equal if they are referentially equal. But I expect that we have can have a standard role that defines .hash and .equal in terms of .canonicalize, if it exists. We can also have a magical role that defines .canonicalize when it can. If .hash isn't defined when an object is used as a key of a hash (but .equal is), it will have to bite the speed bullet and use 0, probably along with a warning. Luke
Re: Valid hash keys?
On Sun, Feb 27, 2005 at 03:36:42PM -0700, Luke Palmer wrote: > But the biggest problem is that if the user overloads 'equal' on two > objects, the hash should consider them equal. We could require that to > overload 'equal', you also have to overload .hash so that you've given > some thought to the thing. The worry I have is that people will do: > > method hash() { 0 } This is why I think a 'canonicalize' method is better than a .hash' method: if canonicalize is defined correctly, then .equal is just {$a.canon() is_bitwise_identical_to $b.canon()}, so people don't have to supply both. Quite often when you would ordinarily define a .equal method, it is of that form anyhow. And then .hash can be straightforwardly defined as {$_.canon().binaryhash()}, where binaryhash() just means hash all the memory of this object. Stringify would be an instance of a canonicalization function; and perhaps the default. Alex
Re: Valid hash keys?
Luke Palmer wrote: The object model that I'm working on actually identifies 2 and "2" as the same object, indistinguishable in every respect. Okay, that's fine, since C< 2 eq "2" > and C< 2 == "2" >. But what about 2.0 and "2.0"? In Perl5, C< 2.0 == "2.0" >, but C< 2.0 ne "2.0" >. -- Rod Adams
Re: Valid hash keys?
On Sun, 27 Feb 2005 15:36:42 -0700, [EMAIL PROTECTED] (Luke Palmer) wrote: > Nigel Sandever writes: > > When we're talking about hashes of everything, there are a couple of > routes we can take. We can go Java's way and define a .hash method on > every object. We can go C++'s way and not hash at all, but instead > define a transitive ordering on every object. The more I think about this, please, no. The reason hashes are called hashes is because they hash. If we need bags, sets, or orderThingies with overloadable transitive ordering, they can be written as classes--that possibly overload the hash syntax obj<> = value or whatever, but don't tack all that overhead on to the basic primitive. > We can go Perl's way and > find a string representation of the object and map the problem back to > string hashing, which we can do well. The only question is why does the thing that gets hashed have to be stringyfied first? In p5, I often use hash{ pack'V', $key } = $value. # or 'd' 1) Because for large hashes using numeric keys it use up less space for the keys. 4-bytes rather than 10 for 2**32. 2) By using all 256 values of each byte, it tends to spread the keys more even across fewer buckets; use Devel::Size qw[total_size size]; undef $h{ pack 'V', $_ } for map{ $_ * 1 } 0 .. 99; print total_size \%h; 18418136 print scalar %h; 292754/524288 versus use Devel::Size qw[total_size size]; undef $h{ $_ } for map{ $_ * 1 } 0 .. 99; print total_size \%h; 48083250 print scalar %h; 644301/1048576 It would also avoid the need for hacks like Tie:RefHash, by hashing the address of the ref rather than the stringyfied ref and forcing the key to be stored twice and the creation of zillions of anonymous arrays to hold the unstrigyfied ref+value. The same could be extended to hashing the composite binary representations of whole structures and objects. njs.
Re: Valid hash keys?
On Sun, Feb 27, 2005 at 11:57:30PM +0100, Thomas Sandlaß wrote: > Alex Burr wrote: > > >[..] Actually, it would be useful sometimes > >to be able to give a hash an explicit canonicalizer: > > > >my %msdos_files is canonicalized_by lc; > > > >my %fractions is canonicalized_by gcd; > > Shouldn't that be handled by container subclasses of Hash? > Like PersitentScalar or SparseArray? Possibly. Clearly that's what one would do in any other language. What I was thinking was that if hashes are going to have a canonicalizer function *anyway*, maybe the default implementation could be overridable with a trait (or role?). But I can't actually claim to have followed perl6 development enough to be able to argue that it really makes sense. Alex
Re: Valid hash keys?
Nigel Sandever writes: > On Sun, 27 Feb 2005 15:36:42 -0700, [EMAIL PROTECTED] (Luke Palmer) wrote: > > As far as getting 2, 2.0, and "2" to hash to the same object, well, we > > know they're 'equal', so we just need to know how to hash them the same > > way. In fact, I don't believe 2.0 can be represented as a Num. At > > least in Perl 5, it translates itself back to an int. So we can just > > stringify and hash for the scalar types. > > My thought is that if c uses the stringyfied values > of > the keys, then it is no different to C, Indeed, but I meant just for our non-reference scalar types, such as Num, Int, and Str. > I think it would be useful for shape(Any) be different to an ordinary > hash, and hashing the binary representation of the key, so that > > (Int)2, (Num)2, (String)2, (uint)2 (uint4)2 etc. > > would be a useful way of collating things according to their "type" > rather than their value? That may indeed be a useful kind of map to have, but it's hardly what people expect when they declare a hash keyed by Any. The reason I want to identify 2 and "2" is because they are identical everywhere else in the language. Also, if you hash on the binary representation, what's to say that uint hashes differently from uint4. And even if we do guarantee that they hash differently, there will be collisions. And then uint(2) equal uint4(2) (for any reasonable implementation of equal), and they are the same, but only for collision values. A better way to make a type-collated hash would be to: class TypeHash is Hash[shape => [Class; Any]] { method postcircumfix:<{ }> (Any $arg) { .SUPER::{$arg.type; $arg} } } And if you really think that it's going to be that common, then you can write a module that implements that. It would be about five lines long. Luke
Re: Valid hash keys?
Alex Burr wrote: [..] Actually, it would be useful sometimes to be able to give a hash an explicit canonicalizer: my %msdos_files is canonicalized_by lc; my %fractions is canonicalized_by gcd; Shouldn't that be handled by container subclasses of Hash? Like PersitentScalar or SparseArray? Regards, -- TSa (Thomas Sandlaß)
Re: Valid hash keys?
On Sun, 27 Feb 2005 15:36:42 -0700, [EMAIL PROTECTED] (Luke Palmer) wrote: > Nigel Sandever writes: > > On Sun, 27 Feb 2005 02:20:59 -0700, [EMAIL PROTECTED] (Luke Palmer) wrote: > > > I forgot an important concretity. Hashes should compare based on the > > > generic "equal" operator, which knows how to compare apples and apples, > > > and oranges and oranges, and occasionally a red orange to an apple. > > > > > > That is: > > > > > > 3 equal 3 ==> true > > > 3 equal { codeblock } ==> false > > > 3 equal "3"==> true > > > > > > > I would have assumed a hash who shape was defined as C to perform > > the hashing function directly on the (nominally 32-bit) binary > > representation Int 2. > > I wasn't even thinking about implementation. Sometimes it's good to let > implementation drive language, but I don't think it's appropriate here. > > When we're talking about hashes of everything, there are a couple of > routes we can take. We can go Java's way and define a .hash method on > every object. We can go C++'s way and not hash at all, but instead > define a transitive ordering on every object. We can go Perl's way and > find a string representation of the object and map the problem back to > string hashing, which we can do well. > > But the biggest problem is that if the user overloads 'equal' on two > objects, the hash should consider them equal. We could require that to > overload 'equal', you also have to overload .hash so that you've given > some thought to the thing. The worry I have is that people will do: > > method hash() { 0 } > > But I suppose that's okay. That just punts the work off to 'equal', > albeit in linear time. > > That may be the right way to go. Use a Javaesque .hash method with a > sane default (an introspecting default, perhaps), and use a sane > equivalent default for 'equal'. > > As far as getting 2, 2.0, and "2" to hash to the same object, well, we > know they're 'equal', so we just need to know how to hash them the same > way. In fact, I don't believe 2.0 can be represented as a Num. At > least in Perl 5, it translates itself back to an int. So we can just > stringify and hash for the scalar types. > My thought is that if c uses the stringyfied values of the keys, then it is no different to C, I think it would be useful for shape(Any) be different to an ordinary hash, and hashing the binary representation of the key, so that (Int)2, (Num)2, (String)2, (uint)2 (uint4)2 etc. would be a useful way of collating things according to their "type" rather than their value? > > Luke njs.
Re: Valid hash keys?
Nigel Sandever writes: > On Sun, 27 Feb 2005 02:20:59 -0700, [EMAIL PROTECTED] (Luke Palmer) wrote: > > I forgot an important concretity. Hashes should compare based on the > > generic "equal" operator, which knows how to compare apples and apples, > > and oranges and oranges, and occasionally a red orange to an apple. > > > > That is: > > > > 3 equal 3 ==> true > > 3 equal { codeblock } ==> false > > 3 equal "3"==> true > > > > I would have assumed a hash who shape was defined as C to perform > the hashing function directly on the (nominally 32-bit) binary > representation Int 2. I wasn't even thinking about implementation. Sometimes it's good to let implementation drive language, but I don't think it's appropriate here. When we're talking about hashes of everything, there are a couple of routes we can take. We can go Java's way and define a .hash method on every object. We can go C++'s way and not hash at all, but instead define a transitive ordering on every object. We can go Perl's way and find a string representation of the object and map the problem back to string hashing, which we can do well. But the biggest problem is that if the user overloads 'equal' on two objects, the hash should consider them equal. We could require that to overload 'equal', you also have to overload .hash so that you've given some thought to the thing. The worry I have is that people will do: method hash() { 0 } But I suppose that's okay. That just punts the work off to 'equal', albeit in linear time. That may be the right way to go. Use a Javaesque .hash method with a sane default (an introspecting default, perhaps), and use a sane equivalent default for 'equal'. As far as getting 2, 2.0, and "2" to hash to the same object, well, we know they're 'equal', so we just need to know how to hash them the same way. In fact, I don't believe 2.0 can be represented as a Num. At least in Perl 5, it translates itself back to an int. So we can just stringify and hash for the scalar types. Luke
Re: Valid hash keys?
On Sun, Feb 27, 2005 at 02:20:59AM -0700, Luke Palmer wrote: > I forgot an important concretity. Hashes should compare based on the > generic "equal" operator, which knows how to compare apples and apples, > and oranges and oranges, and occasionally a red orange to an apple. Um. Hashes don't really compare, though, do they? Maybe you just mean a notional equals operator, which isn't really used; but it seems to me that what hashes acutally implement is more of a 'canonicalize' operator. Actually, it would be useful sometimes to be able to give a hash an explicit canonicalizer: my %msdos_files is canonicalized_by lc; my %fractions is canonicalized_by gcd; Alex
Re: Valid hash keys?
On Sun, 27 Feb 2005 02:20:59 -0700, [EMAIL PROTECTED] (Luke Palmer) wrote: > Luke Palmer writes: > > Autrijus Tang writes: > > > Just a quick question: Is Hash keys still Strings, or can they be > > > arbitary values? > > > > They can be declared to be arbitrary: > > > > my %hash is shape(Any); > > > > > > > If the latter, can Int 2, Num 2.0 and Str "2" point to different > > > values? > > > > That's an interesting question. Some people would want them to, and > > some people would definitely not want them to. I think the general > > consensus is that people would not want them to be different, since in > > the rest of perl, 2 and "2" are the same. > > I forgot an important concretity. Hashes should compare based on the > generic "equal" operator, which knows how to compare apples and apples, > and oranges and oranges, and occasionally a red orange to an apple. > > That is: > > 3 equal 3 ==> true > 3 equal { codeblock } ==> false > 3 equal "3"==> true > I would have assumed a hash who shape was defined as C to perform the hashing function directly on the (nominally 32-bit) binary representation Int 2. Likewise, c, would perform the hashing on the binary rep of the (nom.64-bit) Double value. And C, on the address of the key passed? By extension, a C<%hash is shape( Any )> would hash the binary representation of whatever (type of) key it was given, which would make keys of 2, 2.0, '2', '2.0', (Int2)2 etc. all map to different keys. If C<%hash is shape(Any)> maps all the above representation of 2 to the same value, then C becomes a synonym for C<%hash is shape(Scalar)>. (is that the same as C?). If my auumption is correct, then that would imply that each type hash or inherits a .binary or .raw method to allow access to the internal representation? > Luke
Re: Valid hash keys?
Luke Palmer writes: > Autrijus Tang writes: > > Just a quick question: Is Hash keys still Strings, or can they be > > arbitary values? > > They can be declared to be arbitrary: > > my %hash is shape(Any); > > > > If the latter, can Int 2, Num 2.0 and Str "2" point to different > > values? > > That's an interesting question. Some people would want them to, and > some people would definitely not want them to. I think the general > consensus is that people would not want them to be different, since in > the rest of perl, 2 and "2" are the same. I forgot an important concretity. Hashes should compare based on the generic "equal" operator, which knows how to compare apples and apples, and oranges and oranges, and occasionally a red orange to an apple. That is: 3 equal 3 ==> true 3 equal { codeblock } ==> false 3 equal "3"==> true Luke
Re: Valid hash keys?
Autrijus Tang writes: > Just a quick question: Is Hash keys still Strings, or can they be > arbitary values? They can be declared to be arbitrary: my %hash is shape(Any); > If the latter, can Int 2, Num 2.0 and Str "2" point to different > values? That's an interesting question. Some people would want them to, and some people would definitely not want them to. I think the general consensus is that people would not want them to be different, since in the rest of perl, 2 and "2" are the same. The object model that I'm working on actually identifies 2 and "2" as the same object, indistinguishable in every respect. But that hasn't been accepted (or even proposed)... yet. Luke