Hashes, Stringification, Hashing and Strings

2002-04-16 Thread Aaron Sherman

In this example:

%hash = ($a=$b);

$a can be anything. In fact, since Perl6 promises to retain the original
value of $a, we're rather encouraged to store complex data there. But,
this poses a problem. The key to use for hashing might not ideally be
the string representation.

For example, if I'm hashing a collection of special file objects, it
might be useful in one context to do:

method operator:qq () {
return read_whole_file();
}

But, if I have 100 of these objects, and I try to put them in a hash,
life could get ugly fast!

More helpful would be to have this builtin:

method operator:hash () {
return operator:qq();
}

and then allow me to override it in my class like so:

method operator:hash () {
return filename();
}

This allows me to specify separate hashing and stringification methods,
but retains Perl's original default of combining the two.





Re: Hashes, Stringification, Hashing and Strings

2002-04-16 Thread Larry Wall

Aaron Sherman writes:
: In this example:
: 
:   %hash = ($a=$b);
: 
: $a can be anything. In fact, since Perl6 promises to retain the original
: value of $a, we're rather encouraged to store complex data there. But,
: this poses a problem. The key to use for hashing might not ideally be
: the string representation.
: 
: For example, if I'm hashing a collection of special file objects, it
: might be useful in one context to do:
: 
:   method operator:qq () {
:   return read_whole_file();
:   }
: 
: But, if I have 100 of these objects, and I try to put them in a hash,
: life could get ugly fast!
: 
: More helpful would be to have this builtin:
: 
:   method operator:hash () {
:   return operator:qq();
:   }
: 
: and then allow me to override it in my class like so:
: 
:   method operator:hash () {
:   return filename();
:   }
: 
: This allows me to specify separate hashing and stringification methods,
: but retains Perl's original default of combining the two.

Yes, that's what we intend to do.

Larry



Re: Hashes, Stringification, Hashing and Strings

2002-04-16 Thread Mike Lambert

Speaking of which, how do we ensure the immutability of keys being put
into the hash? I think Perl copied the string, so that:

$b = aa;
$a{$b} = 1;
chop $b;
print $a{aa};

still works.

If we start storing full thingies into the keys of a hash, we either need
to make deep copies of these, or copy enough to ensure the hashing
function has all that it needs.

Say we do:
$b = new Cat();
$a{$b} = 1;
$b-somefunctionthatchangesthehashvalue();
$a{$b} doesn't find anything, since $b was hashed under it's old identity.

Also, Perl allowed this before:
$b = aa;
%a{$b} = 1;
print $a{aa};

Will it allow this?
$b = new Cat();
%a{$b} = 1;
print $a{new Cat()};

If it does, how does it determine equality of two objects? What if Cat
includes a counter that determines which number it is in some total
ordering of cats. Will the above still work?

Java had all sorts of problems where they force immutability of an object
onto the user of their Collection API. It's a pain in the ass, and the
source of stupid bugs, where your object changes it's hash value without
being rehashed. How will perl handle this?

If it automatically rehashed objects in whatever tables they are stored in
(forget the speed hit for now), then we still have the problem of:

$a = a;
%c{$a} = 1;
$b = aa;
%c{$b} = 2;
chop $b;
print %c{a}; #what gets printed?

According to perl 5 semantics, this would print 1. But according to the
rehashing rule above, this is ambiguous. So that's not an option.

I personally liked the stringification of keys. It made things a LOT
simpler. :)

Finally, one last option is to hash based on some memory address, so that
when we store objects in there, we can be assured of no two of them
pointing to the same place, or worrying about the hash function changing.
Assuming we work around our copying GC some way to get a unique object
identity value, we still have the problem of two equivalent strings
hashing to different places, or two equivalent arrays hashing to different
values. I can make a case for the latter being a good thing, but not the
former, as it breaks perl5 semantics.


Two options I see are:
a) stringify everything, and make life simpler. I never really had a
problem with stringified keys...
b) strings get hashed like perl5. everything else gets hashed based
on some unique object identifier (memory address) that's constant
throughout the life of the object. This means that in order for an object
to be able to have equality work in hashtables, from two objects
constructed at different points in time, they'd need to overload the
operator:hash() function to return a string.

Or has this matter been thought through already?

Thanks,
Mike Lambert

Aaron Sherman wrote:

 Date: 16 Apr 2002 12:13:55 -0400
 From: Aaron Sherman [EMAIL PROTECTED]
 To: Perl6 Language List [EMAIL PROTECTED]
 Subject: Hashes, Stringification, Hashing and Strings

 In this example:

   %hash = ($a=$b);

 $a can be anything. In fact, since Perl6 promises to retain the original
 value of $a, we're rather encouraged to store complex data there. But,
 this poses a problem. The key to use for hashing might not ideally be
 the string representation.

 For example, if I'm hashing a collection of special file objects, it
 might be useful in one context to do:

   method operator:qq () {
   return read_whole_file();
   }

 But, if I have 100 of these objects, and I try to put them in a hash,
 life could get ugly fast!

 More helpful would be to have this builtin:

   method operator:hash () {
   return operator:qq();
   }

 and then allow me to override it in my class like so:

   method operator:hash () {
   return filename();
   }

 This allows me to specify separate hashing and stringification methods,
 but retains Perl's original default of combining the two.







Re: Hashes, Stringification, Hashing and Strings

2002-04-16 Thread Aaron Sherman

On Tue, 2002-04-16 at 14:00, Mike Lambert wrote:
 Speaking of which, how do we ensure the immutability of keys being put
 into the hash? I think Perl copied the string, so that:
 
 $b = aa;
 $a{$b} = 1;
 chop $b;
 print $a{aa};
 
 still works.
 
 If we start storing full thingies into the keys of a hash, we either need
 to make deep copies of these, or copy enough to ensure the hashing
 function has all that it needs.


I thought about this myself, and I don't think Perl would do it that
way. Please, Larry, et al. correct me if I'm wrong.

I suspect it would involve:

1. Copying the key (which might be a reference) on insertion.
2. Hashing once, and caching the hash.

This means a minimum of overhead, so it's a good thing. It also means
that the structure of your hash would never need to re-compute if the
hash of a particular object changes (which I would imagine it could
easily do in any number of cases).





Re: Hashes, Stringification, Hashing and Strings

2002-04-16 Thread David Wheeler

On 4/16/02 11:00 AM, Mike Lambert [EMAIL PROTECTED] claimed:

 Speaking of which, how do we ensure the immutability of keys being put
 into the hash? I think Perl copied the string, so that:
 
 $b = aa;
 $a{$b} = 1;
 chop $b;
 print $a{aa};
 
 still works.
 
 If we start storing full thingies into the keys of a hash, we either need
 to make deep copies of these, or copy enough to ensure the hashing
 function has all that it needs.
 
 Say we do:
 $b = new Cat();
 $a{$b} = 1;
 $b-somefunctionthatchangesthehashvalue();
 $a{$b} doesn't find anything, since $b was hashed under it's old identity.

But these aren't really equivalent, are they? In the first, you use a
variable ($b) to create the key, and then a constant (aa) to reference it.
In the second example, you use the variable to both create and reference the
hash value. So really you should do the same for the Perl 5 example:

$b = aa;
$a{$b} = 1;
chop $b;
print $a{$b}; # No key found (will it be autovivified, BTW?)

Or you should do the opposite, make the Perl 6 example equivalent.

# aa will be used for the hash key, as specified by Cat's
# Cmethod operator:hash spec.
$b = new Cat(aa);
$a{$b} = 1;
$b-somefunctionthatchangesthehashvalue();
$a{aa}; # Key found.

 I personally liked the stringification of keys. It made things a LOT
 simpler. :)

I suspect that you'll still be able to do this.

David

-- 
David Wheeler AIM: dwTheory
[EMAIL PROTECTED] ICQ: 15726394
http://david.wheeler.net/  Yahoo!: dew7e
   Jabber: [EMAIL PROTECTED]





Re: Hashes, Stringification, Hashing and Strings

2002-04-16 Thread David Wheeler

On 4/16/02 11:57 AM, Piers Cawley [EMAIL PROTECTED] claimed:

 Personally I'd like the default hash to return some immutable, unique
 and probably opaque object id (something the like
 'Foo=HASH(0x81e2a3c)' you get from unoverloaded objects in Perl5, but
 probably not identical). This isn't going to change as an object's
 contents change.

I would agree that such a default would be preferable, as long as I could
overload it with my own idea of what the hash key should be. This will be
useful for database applications, where I could have two separate objects
that loaded the same data from the database, and are therefore the same
object, even though they wouldn't have the same OID. So I'd want to be able
to say, for hash keys, use a key I define (probably including a primary key
ID from the database).

Regards,

David

-- 
David Wheeler AIM: dwTheory
[EMAIL PROTECTED] ICQ: 15726394
http://david.wheeler.net/  Yahoo!: dew7e
   Jabber: [EMAIL PROTECTED]





Re: Hashes, Stringification, Hashing and Strings

2002-04-16 Thread Steve Fink

On Tue, Apr 16, 2002 at 02:00:33PM -0400, Mike Lambert wrote:
 Speaking of which, how do we ensure the immutability of keys being put
 into the hash? I think Perl copied the string, so that:

RFC266 talks about these issues, though it was just really my take on
the problem at the time. http://dev.perl.org/rfc/266.pod




Re: Hashes, Stringification, Hashing and Strings

2002-04-16 Thread Larry Wall

Piers Cawley writes:
: Aaron Sherman [EMAIL PROTECTED] writes:
: 
:  On Tue, 2002-04-16 at 14:00, Mike Lambert wrote:
:  Speaking of which, how do we ensure the immutability of keys being put
:  into the hash? I think Perl copied the string, so that:
:  
:  $b = aa;
:  $a{$b} = 1;
:  chop $b;
:  print $a{aa};
:  
:  still works.
:  
:  If we start storing full thingies into the keys of a hash, we either need
:  to make deep copies of these, or copy enough to ensure the hashing
:  function has all that it needs.
: 
: 
:  I thought about this myself, and I don't think Perl would do it that
:  way. Please, Larry, et al. correct me if I'm wrong.
: 
:  I suspect it would involve:
: 
:  1. Copying the key (which might be a reference) on insertion.
:  2. Hashing once, and caching the hash.
: 
:  This means a minimum of overhead, so it's a good thing. It also means
:  that the structure of your hash would never need to re-compute if the
:  hash of a particular object changes (which I would imagine it could
:  easily do in any number of cases).
: 
: So you'd have:
: 
:%hash{$some_obj} = $aValue;
:$some_obj.mutator;
:exists %hash{$some_obj} # returns undef. 
: 
: Somehow I *really* don't think that's going to fly.
: 
: Personally I'd like the default hash to return some immutable, unique
: and probably opaque object id (something the like
: 'Foo=HASH(0x81e2a3c)' you get from unoverloaded objects in Perl5, but
: probably not identical). This isn't going to change as an object's
: contents change.

You guys are thinking in terms of a single $obj.hash method.  I think
there will be more than one hashish (er...) method available, and each
hash will be able to choose at least whether it wants to hash by $obj._
(the default), by $obj.hash (mutable), or by $obj.id (immutable).  The
hash function is not sufficient in cases of hash collision, so you also
need to think in terms of multiple comparison methods.

Since these are just properties of the hash, the hash could even have
properties containing closures that define the hash and/or comparison
functions.  These properties can force the issue, regardless of whether
the object in question cooperates.  So you could hash objects on any
attribute or combination of attributes, for instance.

But the default is gonna look a lot like Perl 5.

Larry



Re: Hashes, Stringification, Hashing and Strings

2002-04-16 Thread David Wheeler

On 4/16/02 12:27 PM, Larry Wall [EMAIL PROTECTED] claimed:

 You guys are thinking in terms of a single $obj.hash method.  I think
 there will be more than one hashish (er...) method available, and each
 hash will be able to choose at least whether it wants to hash by $obj._
 (the default), by $obj.hash (mutable), or by $obj.id (immutable).  The
 hash function is not sufficient in cases of hash collision, so you also
 need to think in terms of multiple comparison methods.

Wow, that will be pretty cool, I think.

 Since these are just properties of the hash, the hash could even have
 properties containing closures that define the hash and/or comparison
 functions.  These properties can force the issue, regardless of whether
 the object in question cooperates.  So you could hash objects on any
 attribute or combination of attributes, for instance.

Yes, this sounds ideal to me.

 But the default is gonna look a lot like Perl 5.

That's certainly good enough for me!

Regards,

David 

-- 
David Wheeler AIM: dwTheory
[EMAIL PROTECTED] ICQ: 15726394
http://david.wheeler.net/  Yahoo!: dew7e
   Jabber: [EMAIL PROTECTED]





Re: Hashes, Stringification, Hashing and Strings

2002-04-16 Thread Piers Cawley

David Wheeler [EMAIL PROTECTED] writes:

 On 4/16/02 11:57 AM, Piers Cawley [EMAIL PROTECTED] claimed:

 Personally I'd like the default hash to return some immutable, unique
 and probably opaque object id (something the like
 'Foo=HASH(0x81e2a3c)' you get from unoverloaded objects in Perl5, but
 probably not identical). This isn't going to change as an object's
 contents change.

 I would agree that such a default would be preferable, as long as I could
 overload it with my own idea of what the hash key should be. This will be
 useful for database applications, where I could have two separate objects
 that loaded the same data from the database, and are therefore the same
 object, even though they wouldn't have the same OID. So I'd want to be able
 to say, for hash keys, use a key I define (probably including a primary key
 ID from the database).

Ah yes. I'm currently doing something like that trick with a home
rolled object store.

-- 
Piers

   It is a truth universally acknowledged that a language in
possession of a rich syntax must be in need of a rewrite.
 -- Jane Austen?




Re: Hashes, Stringification, Hashing and Strings

2002-04-16 Thread Aaron Sherman

On Tue, 2002-04-16 at 14:57, Piers Cawley wrote:
 Aaron Sherman [EMAIL PROTECTED] writes:

  I suspect it would involve:
 
  1. Copying the key (which might be a reference) on insertion.
  2. Hashing once, and caching the hash.
 
  This means a minimum of overhead, so it's a good thing. It also means
  that the structure of your hash would never need to re-compute if the
  hash of a particular object changes (which I would imagine it could
  easily do in any number of cases).
 
 So you'd have:
 
%hash{$some_obj} = $aValue;
$some_obj.mutator;
exists %hash{$some_obj} # returns undef. 
 
 Somehow I *really* don't think that's going to fly.
 
 Personally I'd like the default hash to return some immutable, unique
 and probably opaque object id (something the like
 'Foo=HASH(0x81e2a3c)' you get from unoverloaded objects in Perl5, but
 probably not identical). This isn't going to change as an object's
 contents change.

I don't see why your above example would behave as you suggest. Larry
has already put in his TMTOWTDI comment, but let's look back at what I
said, using your example:

%hash{$some_obj} = $aValue;

1. Copy the key (possibly a reference)

%hash.magic_key_store($some_obj)

2. Cache the hash of that key

%hash.magic_hash_store($some_obj.hash())

Now, you suggest:

$some_obj.mutator;
exists %hash{$some_obj} # returns undef. 

Well, if you want that behavior, I'm sure you can get it by redefining
hash() (or whatever the operator/method/whatever is called), but I don't
suggest it in the normal case. By default, as I've suggested in previous
mail the string representation should be used for hashing, which would
lead to that second statement returning true, since $some_obj still has
the same string representation.

Then again, if you're hashing by (as in my previous example) file name,
then you might want to change the hash of your object when it represents
a different file (say, you call an open method on it). There's good
cause to want to do that.

What I was saying above, however, was that you don't want to have to
re-compute your internal hash structure on the fly, hence caching the
hash of your object. That way, if an object returns a 1k block from
/dev/random as it's hash, you don't re-build the entire hash every time
you access an element, which would be absurdly unfortunate. Is there
ever a reason to want to do that?