Item (text) deduplication

xdcfff Mon, 15 Apr 2013 06:39:20 -0700

Hi all,

Just looking for some general guidance on how I would approach this task.


If I have two datasets containing items, what is currently the best way to
detect duplicates between them using Mahout? I intend on matching based on
item name text similarity to begin with.

I'm willing to write Java wherever necessary, but I just want to be sure to
avoid "re-coding the wheel" as such.

Cheers,
-dcf

Item (text) deduplication

Reply via email to