Hashes, Stringification, Hashing and Strings
In this example: %hash = ($a=$b); $a can be anything. In fact, since Perl6 promises to retain the original value of $a, we're rather encouraged to store complex data there. But, this poses a problem. The key to use for hashing might not ideally be the string representation. For example, if I'm hashing a collection of special file objects, it might be useful in one context to do: method operator:qq () { return read_whole_file(); } But, if I have 100 of these objects, and I try to put them in a hash, life could get ugly fast! More helpful would be to have this builtin: method operator:hash () { return operator:qq(); } and then allow me to override it in my class like so: method operator:hash () { return filename(); } This allows me to specify separate hashing and stringification methods, but retains Perl's original default of combining the two.
Re: Hashes, Stringification, Hashing and Strings
Aaron Sherman writes: : In this example: : : %hash = ($a=$b); : : $a can be anything. In fact, since Perl6 promises to retain the original : value of $a, we're rather encouraged to store complex data there. But, : this poses a problem. The key to use for hashing might not ideally be : the string representation. : : For example, if I'm hashing a collection of special file objects, it : might be useful in one context to do: : : method operator:qq () { : return read_whole_file(); : } : : But, if I have 100 of these objects, and I try to put them in a hash, : life could get ugly fast! : : More helpful would be to have this builtin: : : method operator:hash () { : return operator:qq(); : } : : and then allow me to override it in my class like so: : : method operator:hash () { : return filename(); : } : : This allows me to specify separate hashing and stringification methods, : but retains Perl's original default of combining the two. Yes, that's what we intend to do. Larry
Re: Hashes, Stringification, Hashing and Strings
Speaking of which, how do we ensure the immutability of keys being put into the hash? I think Perl copied the string, so that: $b = aa; $a{$b} = 1; chop $b; print $a{aa}; still works. If we start storing full thingies into the keys of a hash, we either need to make deep copies of these, or copy enough to ensure the hashing function has all that it needs. Say we do: $b = new Cat(); $a{$b} = 1; $b-somefunctionthatchangesthehashvalue(); $a{$b} doesn't find anything, since $b was hashed under it's old identity. Also, Perl allowed this before: $b = aa; %a{$b} = 1; print $a{aa}; Will it allow this? $b = new Cat(); %a{$b} = 1; print $a{new Cat()}; If it does, how does it determine equality of two objects? What if Cat includes a counter that determines which number it is in some total ordering of cats. Will the above still work? Java had all sorts of problems where they force immutability of an object onto the user of their Collection API. It's a pain in the ass, and the source of stupid bugs, where your object changes it's hash value without being rehashed. How will perl handle this? If it automatically rehashed objects in whatever tables they are stored in (forget the speed hit for now), then we still have the problem of: $a = a; %c{$a} = 1; $b = aa; %c{$b} = 2; chop $b; print %c{a}; #what gets printed? According to perl 5 semantics, this would print 1. But according to the rehashing rule above, this is ambiguous. So that's not an option. I personally liked the stringification of keys. It made things a LOT simpler. :) Finally, one last option is to hash based on some memory address, so that when we store objects in there, we can be assured of no two of them pointing to the same place, or worrying about the hash function changing. Assuming we work around our copying GC some way to get a unique object identity value, we still have the problem of two equivalent strings hashing to different places, or two equivalent arrays hashing to different values. I can make a case for the latter being a good thing, but not the former, as it breaks perl5 semantics. Two options I see are: a) stringify everything, and make life simpler. I never really had a problem with stringified keys... b) strings get hashed like perl5. everything else gets hashed based on some unique object identifier (memory address) that's constant throughout the life of the object. This means that in order for an object to be able to have equality work in hashtables, from two objects constructed at different points in time, they'd need to overload the operator:hash() function to return a string. Or has this matter been thought through already? Thanks, Mike Lambert Aaron Sherman wrote: Date: 16 Apr 2002 12:13:55 -0400 From: Aaron Sherman [EMAIL PROTECTED] To: Perl6 Language List [EMAIL PROTECTED] Subject: Hashes, Stringification, Hashing and Strings In this example: %hash = ($a=$b); $a can be anything. In fact, since Perl6 promises to retain the original value of $a, we're rather encouraged to store complex data there. But, this poses a problem. The key to use for hashing might not ideally be the string representation. For example, if I'm hashing a collection of special file objects, it might be useful in one context to do: method operator:qq () { return read_whole_file(); } But, if I have 100 of these objects, and I try to put them in a hash, life could get ugly fast! More helpful would be to have this builtin: method operator:hash () { return operator:qq(); } and then allow me to override it in my class like so: method operator:hash () { return filename(); } This allows me to specify separate hashing and stringification methods, but retains Perl's original default of combining the two.
Re: Hashes, Stringification, Hashing and Strings
On Tue, 2002-04-16 at 14:00, Mike Lambert wrote: Speaking of which, how do we ensure the immutability of keys being put into the hash? I think Perl copied the string, so that: $b = aa; $a{$b} = 1; chop $b; print $a{aa}; still works. If we start storing full thingies into the keys of a hash, we either need to make deep copies of these, or copy enough to ensure the hashing function has all that it needs. I thought about this myself, and I don't think Perl would do it that way. Please, Larry, et al. correct me if I'm wrong. I suspect it would involve: 1. Copying the key (which might be a reference) on insertion. 2. Hashing once, and caching the hash. This means a minimum of overhead, so it's a good thing. It also means that the structure of your hash would never need to re-compute if the hash of a particular object changes (which I would imagine it could easily do in any number of cases).
Re: Hashes, Stringification, Hashing and Strings
On 4/16/02 11:00 AM, Mike Lambert [EMAIL PROTECTED] claimed: Speaking of which, how do we ensure the immutability of keys being put into the hash? I think Perl copied the string, so that: $b = aa; $a{$b} = 1; chop $b; print $a{aa}; still works. If we start storing full thingies into the keys of a hash, we either need to make deep copies of these, or copy enough to ensure the hashing function has all that it needs. Say we do: $b = new Cat(); $a{$b} = 1; $b-somefunctionthatchangesthehashvalue(); $a{$b} doesn't find anything, since $b was hashed under it's old identity. But these aren't really equivalent, are they? In the first, you use a variable ($b) to create the key, and then a constant (aa) to reference it. In the second example, you use the variable to both create and reference the hash value. So really you should do the same for the Perl 5 example: $b = aa; $a{$b} = 1; chop $b; print $a{$b}; # No key found (will it be autovivified, BTW?) Or you should do the opposite, make the Perl 6 example equivalent. # aa will be used for the hash key, as specified by Cat's # Cmethod operator:hash spec. $b = new Cat(aa); $a{$b} = 1; $b-somefunctionthatchangesthehashvalue(); $a{aa}; # Key found. I personally liked the stringification of keys. It made things a LOT simpler. :) I suspect that you'll still be able to do this. David -- David Wheeler AIM: dwTheory [EMAIL PROTECTED] ICQ: 15726394 http://david.wheeler.net/ Yahoo!: dew7e Jabber: [EMAIL PROTECTED]
Re: Hashes, Stringification, Hashing and Strings
On 4/16/02 11:57 AM, Piers Cawley [EMAIL PROTECTED] claimed: Personally I'd like the default hash to return some immutable, unique and probably opaque object id (something the like 'Foo=HASH(0x81e2a3c)' you get from unoverloaded objects in Perl5, but probably not identical). This isn't going to change as an object's contents change. I would agree that such a default would be preferable, as long as I could overload it with my own idea of what the hash key should be. This will be useful for database applications, where I could have two separate objects that loaded the same data from the database, and are therefore the same object, even though they wouldn't have the same OID. So I'd want to be able to say, for hash keys, use a key I define (probably including a primary key ID from the database). Regards, David -- David Wheeler AIM: dwTheory [EMAIL PROTECTED] ICQ: 15726394 http://david.wheeler.net/ Yahoo!: dew7e Jabber: [EMAIL PROTECTED]
Re: Hashes, Stringification, Hashing and Strings
On Tue, Apr 16, 2002 at 02:00:33PM -0400, Mike Lambert wrote: Speaking of which, how do we ensure the immutability of keys being put into the hash? I think Perl copied the string, so that: RFC266 talks about these issues, though it was just really my take on the problem at the time. http://dev.perl.org/rfc/266.pod
Re: Hashes, Stringification, Hashing and Strings
Piers Cawley writes: : Aaron Sherman [EMAIL PROTECTED] writes: : : On Tue, 2002-04-16 at 14:00, Mike Lambert wrote: : Speaking of which, how do we ensure the immutability of keys being put : into the hash? I think Perl copied the string, so that: : : $b = aa; : $a{$b} = 1; : chop $b; : print $a{aa}; : : still works. : : If we start storing full thingies into the keys of a hash, we either need : to make deep copies of these, or copy enough to ensure the hashing : function has all that it needs. : : : I thought about this myself, and I don't think Perl would do it that : way. Please, Larry, et al. correct me if I'm wrong. : : I suspect it would involve: : : 1. Copying the key (which might be a reference) on insertion. : 2. Hashing once, and caching the hash. : : This means a minimum of overhead, so it's a good thing. It also means : that the structure of your hash would never need to re-compute if the : hash of a particular object changes (which I would imagine it could : easily do in any number of cases). : : So you'd have: : :%hash{$some_obj} = $aValue; :$some_obj.mutator; :exists %hash{$some_obj} # returns undef. : : Somehow I *really* don't think that's going to fly. : : Personally I'd like the default hash to return some immutable, unique : and probably opaque object id (something the like : 'Foo=HASH(0x81e2a3c)' you get from unoverloaded objects in Perl5, but : probably not identical). This isn't going to change as an object's : contents change. You guys are thinking in terms of a single $obj.hash method. I think there will be more than one hashish (er...) method available, and each hash will be able to choose at least whether it wants to hash by $obj._ (the default), by $obj.hash (mutable), or by $obj.id (immutable). The hash function is not sufficient in cases of hash collision, so you also need to think in terms of multiple comparison methods. Since these are just properties of the hash, the hash could even have properties containing closures that define the hash and/or comparison functions. These properties can force the issue, regardless of whether the object in question cooperates. So you could hash objects on any attribute or combination of attributes, for instance. But the default is gonna look a lot like Perl 5. Larry
Re: Hashes, Stringification, Hashing and Strings
On 4/16/02 12:27 PM, Larry Wall [EMAIL PROTECTED] claimed: You guys are thinking in terms of a single $obj.hash method. I think there will be more than one hashish (er...) method available, and each hash will be able to choose at least whether it wants to hash by $obj._ (the default), by $obj.hash (mutable), or by $obj.id (immutable). The hash function is not sufficient in cases of hash collision, so you also need to think in terms of multiple comparison methods. Wow, that will be pretty cool, I think. Since these are just properties of the hash, the hash could even have properties containing closures that define the hash and/or comparison functions. These properties can force the issue, regardless of whether the object in question cooperates. So you could hash objects on any attribute or combination of attributes, for instance. Yes, this sounds ideal to me. But the default is gonna look a lot like Perl 5. That's certainly good enough for me! Regards, David -- David Wheeler AIM: dwTheory [EMAIL PROTECTED] ICQ: 15726394 http://david.wheeler.net/ Yahoo!: dew7e Jabber: [EMAIL PROTECTED]
Re: Hashes, Stringification, Hashing and Strings
David Wheeler [EMAIL PROTECTED] writes: On 4/16/02 11:57 AM, Piers Cawley [EMAIL PROTECTED] claimed: Personally I'd like the default hash to return some immutable, unique and probably opaque object id (something the like 'Foo=HASH(0x81e2a3c)' you get from unoverloaded objects in Perl5, but probably not identical). This isn't going to change as an object's contents change. I would agree that such a default would be preferable, as long as I could overload it with my own idea of what the hash key should be. This will be useful for database applications, where I could have two separate objects that loaded the same data from the database, and are therefore the same object, even though they wouldn't have the same OID. So I'd want to be able to say, for hash keys, use a key I define (probably including a primary key ID from the database). Ah yes. I'm currently doing something like that trick with a home rolled object store. -- Piers It is a truth universally acknowledged that a language in possession of a rich syntax must be in need of a rewrite. -- Jane Austen?
Re: Hashes, Stringification, Hashing and Strings
On Tue, 2002-04-16 at 14:57, Piers Cawley wrote: Aaron Sherman [EMAIL PROTECTED] writes: I suspect it would involve: 1. Copying the key (which might be a reference) on insertion. 2. Hashing once, and caching the hash. This means a minimum of overhead, so it's a good thing. It also means that the structure of your hash would never need to re-compute if the hash of a particular object changes (which I would imagine it could easily do in any number of cases). So you'd have: %hash{$some_obj} = $aValue; $some_obj.mutator; exists %hash{$some_obj} # returns undef. Somehow I *really* don't think that's going to fly. Personally I'd like the default hash to return some immutable, unique and probably opaque object id (something the like 'Foo=HASH(0x81e2a3c)' you get from unoverloaded objects in Perl5, but probably not identical). This isn't going to change as an object's contents change. I don't see why your above example would behave as you suggest. Larry has already put in his TMTOWTDI comment, but let's look back at what I said, using your example: %hash{$some_obj} = $aValue; 1. Copy the key (possibly a reference) %hash.magic_key_store($some_obj) 2. Cache the hash of that key %hash.magic_hash_store($some_obj.hash()) Now, you suggest: $some_obj.mutator; exists %hash{$some_obj} # returns undef. Well, if you want that behavior, I'm sure you can get it by redefining hash() (or whatever the operator/method/whatever is called), but I don't suggest it in the normal case. By default, as I've suggested in previous mail the string representation should be used for hashing, which would lead to that second statement returning true, since $some_obj still has the same string representation. Then again, if you're hashing by (as in my previous example) file name, then you might want to change the hash of your object when it represents a different file (say, you call an open method on it). There's good cause to want to do that. What I was saying above, however, was that you don't want to have to re-compute your internal hash structure on the fly, hence caching the hash of your object. That way, if an object returns a 1k block from /dev/random as it's hash, you don't re-build the entire hash every time you access an element, which would be absurdly unfortunate. Is there ever a reason to want to do that?