Lucene provides these vectors as 'term vectors' or 'term frequency
vectors'. The MoreLikeThis feature does queries against these (I
think).

http://www.lucidimagination.com/search/?q=term+vectors
http://www.lucidimagination.com/search/?q=MoreLikeThis

On Mon, May 14, 2012 at 11:07 AM, Dmitry Kan <dmitry....@gmail.com> wrote:
> Peyman,
>
> Did you have a look at this?
>
> https://issues.apache.org/jira/browse/LUCENE-2959
>
> the pluggable ranking functions. Can be a good starting point for you.
>
> Dmitry
>
> On Mon, Apr 23, 2012 at 7:29 PM, Peyman Faratin <pey...@robustlinks.com>wrote:
>
>> Hi
>>
>> Has there been any work that tries to integrate Kernel methods [1] with
>> SOLR? I am interested in using kernel methods to solve synonym, hyponym and
>> polysemous (disambiguation) problems which SOLR's Vector space model ("bag
>> of words") does not capture.
>>
>> For example, imagine we have only 3 words in our corpus, "puma", "cougar"
>> and "feline". The 3 words have obviously interdependencies (puma
>> disambiguates to cougar, cougar and puma are instances of felines -
>> hyponyms). Now, imagine 2 docs, d1 and d2, that have the following TF-IDF
>> vectors.
>>
>>                 puma, cougar, feline
>> d1       =   [  2,        0,         0]
>> d2       =   [  0,        1,         0]
>>
>> i.e. d1 has no mention of term cougar or feline and conversely, d2 has no
>> mention of terms puma or feline. Hence under the vector approach d1 and d2
>> are not related at all (and each interpretation of the terms have a unique
>> vector). Which is not what we want to conclude.
>>
>> What I need is to include a kernel matrix (as data) such as the following
>> that captures these relationships:
>>
>>                       puma, cougar, feline
>> puma    =   [  1,        1,         0.4]
>> cougar  =   [  1,        1,         0.4]
>> feline  =   [  0.4,     0.4,         1]
>>
>> then recompute the TF-IDF vector as a product of (1) the original vector
>> and (2) the kernel matrix, resulting in
>>
>>                 puma, cougar, feline
>> d1       =   [  2,        2,         0.8]
>> d2       =   [  1,        1,         0.4]
>>
>> (note, the new vectors are much less sparse).
>>
>> I can solve this problem (inefficiently) at the application layer but I
>> was wondering if there has been any attempts within the community to solve
>> similar problems, efficiently without paying a hefty response time price?
>>
>> thank you
>>
>> Peyman
>>
>> [1] http://en.wikipedia.org/wiki/Kernel_methods
>
>
>
>
> --
> Regards,
>
> Dmitry Kan



-- 
Lance Norskog
goks...@gmail.com

Reply via email to