My understanding was that Hadley wanted 'digest' to operate on part of
an object rather than on the entire, which might contain uninteresting
or irrelevant details.  For example, if we had

a <- structure(list(x = 1, y = 2), class = "foo")
b <- structure(list(x = 2342342, y = 2), class = "foo")

digest.foo <- function(object, ...) digest(object$y)

Then 'digest(a)' and 'digest(b)' would return the same value in this
case, even though 'a' and 'b' are different objects.  I can see why
someone *might* want digest to return the same hash for 'a' and 'b'
but I would personally find this behavior a little surprising.

-roger

On 10/16/07, Dirk Eddelbuettel <[EMAIL PROTECTED]> wrote:
>
> Hi Roger,
>
> On 16 October 2007 at 08:25, Roger Peng wrote:
> | Sorry, I forgot the 'reply-all'.
> |
> | -roger
> |
> | ---------- Forwarded message ----------
> | From: Roger Peng <[EMAIL PROTECTED]>
> | Date: Oct 16, 2007 8:24 AM
> | Subject: Re: [Rd] Digest package - make digest generic?
> | To: Henrik Bengtsson <[EMAIL PROTECTED]>
> |
> |
> | Would it be possible to instead create a function with a name like
> | 'digest0' which is the current function, and then create a generic
> | function with the name 'digest'?  In this case 'digest0' always
> | returns the digest of the "raw" object.
> |
> | My one concern is that my current expectation is that 'digest' takes
> | an object and hashes the entire object, regardless of class.  So if
> | two objects are different (even in their internal representation),
> | they should return different digests.  I would be a little worried if
> | 'digest' had a different (and perhaps unpredictable) behavior
> | depending on the class of the object where two objects that were in
> | fact different could lead to the same digest.
>
> But haven't the cryptographers taken care of that argument?
>
> To my layman's understanding, the consensus is that hash collissions are
> possible but very very unlikely. And we already have that problem with digest
> as it stands as -- if collission are possible, identical hashes could result
> from two different input whether or not digest is generic or not.
>
> Or am I missing what you were trying to get at?
>
> | I can see why one might want class-specific behavior, but what a class
> | author wants from 'digest' may not be different from what other users
> | of 'digest' on that object want.
> |
> | A simple approach might be
> |
> | digest0 <- function(x, ...) digest(unclass(x), ...)
>
> Or, just for argument's sake, we go full circle, digest stays as it is and
> Hadley implements his own generic, say, 'Digest()', aroumd digest ?  Naa....
>
> I think I like the idea of making it generic, but I really would like to
> know more about possible downsides.
>
> Dirk
>
> | although this doesn't work for S4 objects I don't think.
> |
> | -roger
> |
> | On 10/15/07, Henrik Bengtsson <[EMAIL PROTECTED]> wrote:
> | > On 10/15/07, hadley wickham <[EMAIL PROTECTED]> wrote:
> | > > On 10/15/07, Henrik Bengtsson <[EMAIL PROTECTED]> wrote:
> | > > > [As agreed, CC:ing r-devel since others might be interested in this 
> as well.]
> | > > >
> | > > > Hi.
> | > > >
> | > > > On 10/15/07, Dirk Eddelbuettel <[EMAIL PROTECTED]> wrote:
> | > > > >
> | > > > > Hi Hadley,
> | > > > >
> | > > > > On 15 October 2007 at 09:51, hadley wickham wrote:
> | > > > > | Would you consider making digest a generic function?  That way I 
> could
> | > > > > | (e.g.) make a generic method for ggplot objects which didn't 
> depend
> | > > > > | (so much) on their internal representation.
> | > > > >
> | > > > > Well, generally speaking, I always take patches :)
> | > > >
> | > > > I see know problems in doing this.  The patch would be:
> | > > >
> | > > > digest <- function(...) UseMethod("digest");
> | > > > digest.default <- <current digest function>.
> | > > >
> | > > > I think that should do, and I don't think it has any surprising side
> | > > > effects so it could be added in the next release.  Dirk, can you do
> | > > > that?
> | > > >
> | > > > >
> | > > > > I have to admit that I am fairly weak on these aspects of the S 
> language.
> | > > > > One question is:  how to the current users of digest (i.e. Henrik's 
> and
> | > > > > Seth's caching mechanism, for example) use it on arbitrary objects 
> _without_
> | > > > > it being generic?
> | > > >
> | > > > I basically put everything I want into a list() and pass that to
> | > > > digest::digest().
> | > >
> | > > Yes, that's what I'm doing too.
> | > >
> | > > > >
> | > > > > | The reason I ask is that I'm using digest as a way of coming up 
> with a
> | > > > > | unique file name for each example graphic.  I want to be able to
> | > > > > | easily compare the appearance of examples between versions, but
> | > > > > | currently the digest depends on internal details, so it's hard to
> | > > > > | match up graphics between versions.
> | > > >
> | > > > See loadCache(key) and saveCache(object, key) in R.cache, which
> | > > > basically loads and saves results from and to a file cache based on a
> | > > > key object - no need to specify paths or filenames.  You can specify
> | > > > paths etc if you want to, but by default it is just transparent.
> | > >
> | > > The problem is I need to refer to the image from the documentation, so
> | > > I do need to know it's path.  I also want to be able to look at the
> | > > image, so if the digests are different I can see what the difference
> | > > is (I'm planning to automate this with the imagemagick compare command
> | > > line tool).
> | >
> | > See ?findCache.  That will give you the pathname given a key.  It is
> | > on purpose that I do not list this function in the HTML help index - I
> | > want to keep the "public" API to a minimum.
> | >
> | > /Henrik
> | >
> | > >
> | > > > However, I think Hadley is referring to a different problem.
> | > > > Basically, he got an object containing a lot of fields, but for his
> | > > > purposes it is only a subset of the fields that he wants to use to
> | > > > generate a consistent the hashcode.  If he pass any other field, that
> | > >
> | > > Yes, exactly.
> | > >
> | > > > will break the consistency.  In that case, the designer of the class
> | > > > has to identify the fields that makes uniquely identify the state of
> | > > > the object.  I do that for many of my object and pass them down in a
> | > > > list() structure to digest().  I agree, by making digest() generic,
> | > > > one can make the code nicer.  [If there is a need to dispatch on
> | > > > multiple arguments, we have to go for S4, but otherwise S3 gives the
> | > > > minimal modification].
> | > > >
> | > > > Side comment: This basically comes down to how for instance Java deals
> | > > > with hashCode() and equals() etc.  By default the object as is used to
> | > > > generate the hashcode (and can be used by equals() compare objects).
> | > >
> | > > Yes, that's the model I was thinking of too.
> | > >
> | > > Hadley
> | > >
> | > > --
> | > > http://had.co.nz/
> | > >
> | > > ______________________________________________
> | > > R-devel@r-project.org mailing list
> | > > https://stat.ethz.ch/mailman/listinfo/r-devel
> | > >
> | >
> | > ______________________________________________
> | > R-devel@r-project.org mailing list
> | > https://stat.ethz.ch/mailman/listinfo/r-devel
> | >
> |
> |
> | --
> | Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/
> |
> |
> | --
> | Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/
> |
> | ______________________________________________
> | R-devel@r-project.org mailing list
> | https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Three out of two people have difficulties with fractions.
>


-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to