On Mon, Dec 21, 2009 at 3:36 PM, Lance Norskog <goks...@gmail.com> wrote:
> Solr does have the ExternalFileField available. You could track
> existing clicks from the container search log and generate a file to
> be used with ExternalFileField.
>
> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>
> In the solr source, trunk/src/test/test-files/solr/conf/schema11.xml
> and schema-trie.xml show how to use it.

This approach will be limited to applying a "global" rank to all the
documents, which may have some unintended consequences. The most
popular document in your index will be the most popular, even for
queries for which it was never clicked on. We've currently been
working on this problem in our own implementation and implemented it
using a FunctionQuery (http://wiki.apache.org/solr/FunctionQuery). We
create a ValueSourceParser and hook it into our Solr config:

    <valueSourceParser name="qpop" class="QueryPopularity">
        <str name="popfile">/path/to/popularity_file.xml</str>
    </valueSourceParser>

Then we use the new function in our request handler(s):

    <requestHandler name="..." class="...">
        ...
        <str name="bf">
            qpop(id)
        </str>
    </requestHandler>

The QueryPopularity class takes the current (normalized) query and
indexes into popularity_file.xml to find out what document IDs (it
uses the "id" field because that's what we specified in the arguments
to "qpop", you could use any field you want) are popular for the
current query. Documents which are popular, get a score greater than
zero proportional to their popularity. We do offline processing every
night to build the mappings of query -> popular ID and push that file
to our machines. QueryPopularity has a background thread, which
periodically refreshes the in-memory copy of the XML file's contents.

The main difference is that this is a two-level hash (query -> id ->
score), whereas the ExternalFileField appears to be a one-level hash
(id -> score).

Ryan

Reply via email to