: hash uniqueKey generation?
I think the deduplication signature field will work as a multiValued field. So
you can do copyField to it from all of the source fields.
Dan Lynn wrote:
Hi,
I just finished reading on the wiki about deduplication and the
solr.UUIDField
type. What I'd like to do
Thanks for the feedback, guys!
On 11/15/2010 10:14 AM, Dan Lynn wrote:
Hi,
I just finished reading on the wiki about deduplication and the
solr.UUIDField type. What I'd like to do is generate an ID for a
document by hashing a subset of its fields. One route I thought would
be to do this
On Tue, Nov 16, 2010 at 5:31 AM, Dennis Gearon gear...@sbcglobal.net wrote:
hashing is not 100% guaranteed to produce unique values.
But if you go to enough bits with a good hash function, you can get
the odds lower than the odds of something else changing the value like
cosmic rays flipping a
To Life,
otherwise we all die.
- Original Message
From: Yonik Seeley yo...@lucidimagination.com
To: solr-user@lucene.apache.org
Sent: Tue, November 16, 2010 1:46:43 PM
Subject: Re: hash uniqueKey generation?
On Tue, Nov 16, 2010 at 5:31 AM, Dennis Gearon gear...@sbcglobal.net wrote:
hashing
On Tue, Nov 16, 2010 at 9:05 PM, Dennis Gearon gear...@sbcglobal.net wrote:
Read up on WikiPedia, but I believe that no Hash Function is much good above
50%
of the address space it generates.
50% is way to high - collisions will happen before that.
But given that something like MD5 has 128
Nobody has ever reported seeing a collision 'in the wild' with MD5. It
is broken, but that takes an algorithm.
As to cosmic rays: it's a real problem. A recent Google paper reported
that some ram chips will have 1 bit error per gigabit per century, while
others have that much per hour. I've
Hi,
I just finished reading on the wiki about deduplication and the
solr.UUIDField type. What I'd like to do is generate an ID for a
document by hashing a subset of its fields. One route I thought would be
to do this ahead of time to CSV data, but I would think sticking
something into the
I think the deduplication signature field will work as a multiValued
field. So you can do copyField to it from all of the source fields.
Dan Lynn wrote:
Hi,
I just finished reading on the wiki about deduplication and the
solr.UUIDField type. What I'd like to do is generate an ID for a
: I just finished reading on the wiki about deduplication and the solr.UUIDField
: type. What I'd like to do is generate an ID for a document by hashing a subset
: of its fields. One route I thought would be to do this ahead of time to CSV
: data, but I would think sticking something into the