Something we are working on for purely content based similarity is using a KNN
engine (search engine) but creating features from word2vec and an NER (Named
Entity Recognizer).
putting the generated features into fields of a doc can really help with
similarity because w2v and NER create
Charles, thank you, I will check that out.
Ted, I am looking for semantic similarity. Unfortunately, I do not have any
data on the usage of the documents (if by usage you mean user behavior).
On Sun, Feb 14, 2016 at 4:04 PM, Ted Dunning wrote:
> Did you want textual
Hi,
I need to build a system to determine N (i.e. 10) most similar documents to
a given document. I have some (theoretical) knowledge of Mahout algorithms,
but not enough to build the system. Can you give me some suggestions?
At first I was researching Latent Semantic Analysis for the task, but
Did you want textual similarity?
Or semantic similarity?
The actual semantics of a message can be opaque from the content, but clear
from the usage.
On Sun, Feb 14, 2016 at 5:29 AM, Charles Earl wrote:
> David,
> LDA or LSI can work quite nicely for similarity (YMMV of