[ 
https://issues.apache.org/jira/browse/SOLR-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643291#action_12643291
 ] 

Grant Ingersoll commented on SOLR-651:
--------------------------------------

{quote}
# Adding the uniqueKeyFieldName seems out of place.... it's just one element of 
the schema and it doesn't seem like it belongs in this component.
# How about using the "id" as the key, as is done in other places like 
highlighting.
{quote}
That's fine.  I think my thinking was that by using a "constant" for the name, 
then one could ask explicitly for that property in the NamedList.  That is 
namedList.getVal("uniqueKey");

{quote}
It doesn't seem like we should link the ability to return term vectors with 
term vectors being stored. Like highlighting, they should be used when 
available for speed, but stored fields should also be possible. It's fine for 
the impl of that to wait, but perhaps the interface should support that via a 
tv.fl parameter. update: just looked at the code again, and I see there is a 
tv.fl param.... so I guess the only discussion point is if the default is right 
(all fields with term vectors stored).
{quote}

That's reasonable.  We can open a separate issue for it if anyone wants it.

{quote}
# "idf" actually isn't the idf, it's the doc freq that is being returned. The 
label should probably be changed to "df"
# instead of "freq", how about just using the shorter and well-known "tf"?
# the docs say that tf_idf "Calculates tf*idf for each term.", but the code is 
actually returning "freq"/"idf" (but the idf is actually a df, so it is a 
straight tf * idf). But this doesn't seem that useful because the user could 
trivially do tf/df themselves. What would seem useful is to get the actual 
scoring tf-idf (via the Similarity). For better language mappings, I think we 
should avoid dashes in parameter names too.... perhaps tv.tfidf or tv.tf_idf?
{quote}
All fine as well.  I just added the tf*idf computation in as a based on 
Vaijanath's comments.  I'll update these and the wiki.


> A SearchComponent for fetching TF-IDF values
> --------------------------------------------
>
>                 Key: SOLR-651
>                 URL: https://issues.apache.org/jira/browse/SOLR-651
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-651-fixes.patch, SOLR-651.patch, SOLR-651.patch, 
> SOLR-651.patch, SOLR-651.patch, SOLR-651.patch, SOLR-651.patch
>
>
> A SearchComponent that can return TF-IDF vector for any given document in the 
> SOLR index
> Query : A Document Number / a query identifying a Document
> Response :  A Map of term vs.TF-IDF value of every term in the Selected
> Document
> Why ?
> Most of the Machine Learning Algorithms work on TFIDF representation of
> documents, hence adding a Request Handler proving the TFIDF representation
> will pave the way for incorporating Learning Paradigms to SOLR framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to