Peyman,

Did you have a look at this?

https://issues.apache.org/jira/browse/LUCENE-2959

the pluggable ranking functions. Can be a good starting point for you.

Dmitry

On Mon, Apr 23, 2012 at 7:29 PM, Peyman Faratin <pey...@robustlinks.com>wrote:

> Hi
>
> Has there been any work that tries to integrate Kernel methods [1] with
> SOLR? I am interested in using kernel methods to solve synonym, hyponym and
> polysemous (disambiguation) problems which SOLR's Vector space model ("bag
> of words") does not capture.
>
> For example, imagine we have only 3 words in our corpus, "puma", "cougar"
> and "feline". The 3 words have obviously interdependencies (puma
> disambiguates to cougar, cougar and puma are instances of felines -
> hyponyms). Now, imagine 2 docs, d1 and d2, that have the following TF-IDF
> vectors.
>
>                 puma, cougar, feline
> d1       =   [  2,        0,         0]
> d2       =   [  0,        1,         0]
>
> i.e. d1 has no mention of term cougar or feline and conversely, d2 has no
> mention of terms puma or feline. Hence under the vector approach d1 and d2
> are not related at all (and each interpretation of the terms have a unique
> vector). Which is not what we want to conclude.
>
> What I need is to include a kernel matrix (as data) such as the following
> that captures these relationships:
>
>                       puma, cougar, feline
> puma    =   [  1,        1,         0.4]
> cougar  =   [  1,        1,         0.4]
> feline  =   [  0.4,     0.4,         1]
>
> then recompute the TF-IDF vector as a product of (1) the original vector
> and (2) the kernel matrix, resulting in
>
>                 puma, cougar, feline
> d1       =   [  2,        2,         0.8]
> d2       =   [  1,        1,         0.4]
>
> (note, the new vectors are much less sparse).
>
> I can solve this problem (inefficiently) at the application layer but I
> was wondering if there has been any attempts within the community to solve
> similar problems, efficiently without paying a hefty response time price?
>
> thank you
>
> Peyman
>
> [1] http://en.wikipedia.org/wiki/Kernel_methods




-- 
Regards,

Dmitry Kan

Reply via email to