Peter McMahan <[EMAIL PROTECTED]> writes: > That's a good point.
What's a good point? [this is why top-posting isn't so helpful]. > What's the overhead on digests like that? Depends on the digest algorithm, the implementation, etc. To some extent, you can just try it and see. Or you can compute the digest of an average sized subgraph node label list in a loop and estimate that way. > Also, does that open up the possibility, exceedingly small though it > may be, of misidentifying a branch as already searched and missing a > qualifying subgraph? Yes and the size of "exceedingly small" depends on the digest. I don't think this is worth worrying about. >>> Also, is it better to over-estimate or under-estimate the >>> size parameter? I perhaps should have stressed that over-estimating is better. The way hashed environments work is that a vector is initialized to the desired size and collisions are resolved using chaining. To reduce collisions and have a more efficient hashtable, you want to have more slots in the vector than items since the hash function is rarely perfect for your data. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
