Boosting documents with terms derived from clustering - good idea?

David Parks Tue, 14 May 2013 04:04:53 -0700

We have a number of queries that produce good results based on the textual
data, but are contextually wrong (for example, an "SSD hard drive" search
matches the music album "SSD hip hop drives us crazy".


 

Textually a fair match, but SSD is a term that strongly relates to technical
documents.

 

We'd like to be able to direct this query more strictly in the direction of
the technical documents based on the term "SSD".  I am considering whether
it would be worth trying to cluster all documents, thus tending to group the
music with the music and tech items with the tech items. Then pulling out
the term vectors that define each group; do a human review of that data; and
plug it back into the documents of each cluster as a separate search field
that gets boosted.

 

In my head it seems like a plausible way to weigh terms like SSD to the
cluster of items that it most closely associates.

 

Should I spend the effort to find out?

Yeh or neh?

Boosting documents with terms derived from clustering - good idea?

Reply via email to