My understanding was that Hadley wanted 'digest' to operate on part of an object rather than on the entire, which might contain uninteresting or irrelevant details. For example, if we had
a <- structure(list(x = 1, y = 2), class = "foo") b <- structure(list(x = 2342342, y = 2), class = "foo") digest.foo <- function(object, ...) digest(object$y) Then 'digest(a)' and 'digest(b)' would return the same value in this case, even though 'a' and 'b' are different objects. I can see why someone *might* want digest to return the same hash for 'a' and 'b' but I would personally find this behavior a little surprising. -roger On 10/16/07, Dirk Eddelbuettel <[EMAIL PROTECTED]> wrote: > > Hi Roger, > > On 16 October 2007 at 08:25, Roger Peng wrote: > | Sorry, I forgot the 'reply-all'. > | > | -roger > | > | ---------- Forwarded message ---------- > | From: Roger Peng <[EMAIL PROTECTED]> > | Date: Oct 16, 2007 8:24 AM > | Subject: Re: [Rd] Digest package - make digest generic? > | To: Henrik Bengtsson <[EMAIL PROTECTED]> > | > | > | Would it be possible to instead create a function with a name like > | 'digest0' which is the current function, and then create a generic > | function with the name 'digest'? In this case 'digest0' always > | returns the digest of the "raw" object. > | > | My one concern is that my current expectation is that 'digest' takes > | an object and hashes the entire object, regardless of class. So if > | two objects are different (even in their internal representation), > | they should return different digests. I would be a little worried if > | 'digest' had a different (and perhaps unpredictable) behavior > | depending on the class of the object where two objects that were in > | fact different could lead to the same digest. > > But haven't the cryptographers taken care of that argument? > > To my layman's understanding, the consensus is that hash collissions are > possible but very very unlikely. And we already have that problem with digest > as it stands as -- if collission are possible, identical hashes could result > from two different input whether or not digest is generic or not. > > Or am I missing what you were trying to get at? > > | I can see why one might want class-specific behavior, but what a class > | author wants from 'digest' may not be different from what other users > | of 'digest' on that object want. > | > | A simple approach might be > | > | digest0 <- function(x, ...) digest(unclass(x), ...) > > Or, just for argument's sake, we go full circle, digest stays as it is and > Hadley implements his own generic, say, 'Digest()', aroumd digest ? Naa.... > > I think I like the idea of making it generic, but I really would like to > know more about possible downsides. > > Dirk > > | although this doesn't work for S4 objects I don't think. > | > | -roger > | > | On 10/15/07, Henrik Bengtsson <[EMAIL PROTECTED]> wrote: > | > On 10/15/07, hadley wickham <[EMAIL PROTECTED]> wrote: > | > > On 10/15/07, Henrik Bengtsson <[EMAIL PROTECTED]> wrote: > | > > > [As agreed, CC:ing r-devel since others might be interested in this > as well.] > | > > > > | > > > Hi. > | > > > > | > > > On 10/15/07, Dirk Eddelbuettel <[EMAIL PROTECTED]> wrote: > | > > > > > | > > > > Hi Hadley, > | > > > > > | > > > > On 15 October 2007 at 09:51, hadley wickham wrote: > | > > > > | Would you consider making digest a generic function? That way I > could > | > > > > | (e.g.) make a generic method for ggplot objects which didn't > depend > | > > > > | (so much) on their internal representation. > | > > > > > | > > > > Well, generally speaking, I always take patches :) > | > > > > | > > > I see know problems in doing this. The patch would be: > | > > > > | > > > digest <- function(...) UseMethod("digest"); > | > > > digest.default <- <current digest function>. > | > > > > | > > > I think that should do, and I don't think it has any surprising side > | > > > effects so it could be added in the next release. Dirk, can you do > | > > > that? > | > > > > | > > > > > | > > > > I have to admit that I am fairly weak on these aspects of the S > language. > | > > > > One question is: how to the current users of digest (i.e. Henrik's > and > | > > > > Seth's caching mechanism, for example) use it on arbitrary objects > _without_ > | > > > > it being generic? > | > > > > | > > > I basically put everything I want into a list() and pass that to > | > > > digest::digest(). > | > > > | > > Yes, that's what I'm doing too. > | > > > | > > > > > | > > > > | The reason I ask is that I'm using digest as a way of coming up > with a > | > > > > | unique file name for each example graphic. I want to be able to > | > > > > | easily compare the appearance of examples between versions, but > | > > > > | currently the digest depends on internal details, so it's hard to > | > > > > | match up graphics between versions. > | > > > > | > > > See loadCache(key) and saveCache(object, key) in R.cache, which > | > > > basically loads and saves results from and to a file cache based on a > | > > > key object - no need to specify paths or filenames. You can specify > | > > > paths etc if you want to, but by default it is just transparent. > | > > > | > > The problem is I need to refer to the image from the documentation, so > | > > I do need to know it's path. I also want to be able to look at the > | > > image, so if the digests are different I can see what the difference > | > > is (I'm planning to automate this with the imagemagick compare command > | > > line tool). > | > > | > See ?findCache. That will give you the pathname given a key. It is > | > on purpose that I do not list this function in the HTML help index - I > | > want to keep the "public" API to a minimum. > | > > | > /Henrik > | > > | > > > | > > > However, I think Hadley is referring to a different problem. > | > > > Basically, he got an object containing a lot of fields, but for his > | > > > purposes it is only a subset of the fields that he wants to use to > | > > > generate a consistent the hashcode. If he pass any other field, that > | > > > | > > Yes, exactly. > | > > > | > > > will break the consistency. In that case, the designer of the class > | > > > has to identify the fields that makes uniquely identify the state of > | > > > the object. I do that for many of my object and pass them down in a > | > > > list() structure to digest(). I agree, by making digest() generic, > | > > > one can make the code nicer. [If there is a need to dispatch on > | > > > multiple arguments, we have to go for S4, but otherwise S3 gives the > | > > > minimal modification]. > | > > > > | > > > Side comment: This basically comes down to how for instance Java deals > | > > > with hashCode() and equals() etc. By default the object as is used to > | > > > generate the hashcode (and can be used by equals() compare objects). > | > > > | > > Yes, that's the model I was thinking of too. > | > > > | > > Hadley > | > > > | > > -- > | > > http://had.co.nz/ > | > > > | > > ______________________________________________ > | > > R-devel@r-project.org mailing list > | > > https://stat.ethz.ch/mailman/listinfo/r-devel > | > > > | > > | > ______________________________________________ > | > R-devel@r-project.org mailing list > | > https://stat.ethz.ch/mailman/listinfo/r-devel > | > > | > | > | -- > | Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/ > | > | > | -- > | Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/ > | > | ______________________________________________ > | R-devel@r-project.org mailing list > | https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Three out of two people have difficulties with fractions. > -- Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/ ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel