Hi Rusty,
Thanks for the answer.
We have indexed the following json object:
{
"@class": "com.starsite.data.Answer",
"answer_text": "momo is the best nepalese food",
"keywords": null,
"metaDescription": null,
"post_date": null,
"id": "202ba4ac-0fd3-4709-ba84-463e0caa413c",
"version": 1,
"scope": [
"type|com.starsite.data.Answer"
]
}
we issued the following query:
answer_text: "food"
and the data we got in keydata was as follows:
[{"p":[4,0],"score":[4.855199135883779,1.8398742574541822]}]
What does 0-indexing mean ? If the scoring in riak-search is done based on
vector-space model like in lucene, I was expecting the scores to be normalized
between 0 and 1.
In case of position information, I assume the words 'is' and 'the' are removed
as part of stopwords removal. If they're not removed the position should have
been 5. If they are removed, the position should have been 3. The word "food"
occurs only once. Shouldn't we be getting just one position ?
Thanks,
Archana
On Aug 5, 2011, at 11:08 AM, Rusty Klophaus wrote:
Hi Archana,
Yes, the 'p' attribute is positional information. That list is indicating that
the term occurs on the 0th and 43rd positions in the document, and is
0-indexed. Not sure why you are getting two positions if the word only occurred
once. What was the original query?
The scoring information that you see is a bug. For now, as a workaround, you
can add the scores together. This will give you a *relative* score, allowing
you to rank results for the current query.
To fix this issue, some processing needs to happen within riak to combine and
normalize the scores into a final score that can be used for correct ranking
against other queries as well. (This is being done for the Solr interface, but
not the Map/Reduce interface.) Riak Search models scoring after Lucene as much
as possible, so you can read this for more information about scoring,
especially the final normalization step:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
This issue is tracked in https://issues.basho.com/show_bug.cgi?id=1154
Best,
Rusty
On Thu, Aug 4, 2011 at 3:27 PM, Archana Bhattarai
<[email protected]<mailto:[email protected]>> wrote:
Hi Rusty,
Thanks a lot for the answer. We could get some data in the keydata as follows:
[{"p":[43,0],"score":[5.3669048584479,1.7201627119528418]}
But couldn't exactly interpret what it's representing. I believe p is giving
positional information. But why is it two dimensional when the word we searched
only occurred once in the document. Does the position ignore stopword positions
and just count other words? Also why are there two scores ? Isn't the score
normalized ? Or am I doing something wrong to get these scores ?
Thanks a lot in advance,
Archana
On Jul 22, 2011, at 11:09 AM, Rusty Klophaus wrote:
Hi Archana,
Yes. When you use a search query to initiate a map/reduce job, the scores are
fed into the first phase as keydata, along with other metadata about the search
result including positional information and any inline fields.
More information in the links below:
*
http://wiki.basho.com/Riak-Search---Querying.html#Querying-Integrated-with-Map-Reduce
* http://wiki.basho.com/MapReduce.html (search for "keydata")
Best,
Rusty
On Fri, Jul 22, 2011 at 10:53 AM, Archana Bhattarai
<[email protected]<mailto:[email protected]>> wrote:
Hi,
Is there a way to get back the score while querying via solr interface or
ideally mapreduce over search ? It looks like solr interface only supports
sorting.
Thanks in advance,
Archana
_______________________________________________
riak-users mailing list
[email protected]<mailto:[email protected]>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
--
Rusty Klophaus
Basho Technologies, Inc.
11921 Freedom Drive, Suite 550
Reston, VA 20190
www.basho.com<http://www.basho.com/>
_______________________________________________
riak-users mailing list
[email protected]<mailto:[email protected]>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
--
Rusty Klophaus
Basho Technologies, Inc.
11921 Freedom Drive, Suite 550
Reston, VA 20190
www.basho.com<http://www.basho.com/>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com