multiple cores

Henrib Thu, 18 Oct 2007 05:16:30 -0700

We have an application where we index documents that can exist in many (at
least 2) languages.
We have 1 SolrCore per language using the same field names in their schemas
(different stopwords , synonyms & stemmers), the benefits for content
maintenance overweighting (at least) complexity.
Using EN & FR as an example, a document always exist in EN as a reference
and some of them - not all - are translated in FR; the same document unique
id is used for the reference & the translation.
If a user performs a query in FR, FR documents and EN documents are
searched.
FR docs are seeked first; the same query is also run against EN removing
from the document set those returned by the FR query. That is, if document
id 'AZ123' is retrieved through the FR query, it can't be retrieved by the
EN query. Removing the FR returned documents ids from the EN searchable
document set guarantees that the 2 results sets are disjoint.


1/ Anyone with the same kind of functional requirements? Is using multiple
cores a bad idea for this need ?

On the practical side, this lead me to a handler that needs to restrict the
document set through an externally defined list of Solr unique ids (we also
need to deal with some upfront ACL management to top it all).
However, I'm missing a small method that would nicely complete the
SolrIndexSearcher.getListDoc*.

  public DocList getDocList(Query query, DocSet filter, Sort lsort, int
offset, int len, int flags) throws IOException {
    DocListAndSet answer = new DocListAndSet();
    getDocListC(answer,query,null,filter,lsort,offset,len,flags);
    return answer.docList;
  }

I intend to use this after I intersect potential filter queries & the
restricted document set in the request handler; the Query filter version of
the method is exposed, this would be the DocSet version of it.
2/ Any reason not to do this? {Sh,C}ould this method be included -or should
I create an enhancement request ?

My current idea to create the DocSet from the document ids is the following:

DocSet keyFilter(org.apache.lucene.index.IndexReader reader,
            String keyField,
            java.util.Iterator<String> ikeys) throws java.io.IOException {
        org.apache.solr.util.OpenBitSet bits = new
org.apache.solr.util.OpenBitSet(reader.maxDoc());
        if (ikeys.hasNext()) {
            org.apache.lucene.index.Term term = new
org.apache.lucene.index.Term(keyField,ikeys.next());
            org.apache.lucene.index.TermDocs termDocs =
reader.termDocs(term);
            try {
              if (termDocs.next())
                  bits.fastSet(termDocs.doc());
              while(ikeys.hasNext()) {
                  termDocs.seek(term.createTerm(ikeys.next()));
                  if(termDocs.next())
                      bits.fastSet(termDocs.doc());
               }
            } 
            finally {
              termDocs.close();
            }
        }
        return new org.apache.solr.search.BitDocSet(bits);
    }

3/ Any better/faster way to create a DocSet from a list of unique ids?

Comments & questions welcome.
Thanks


-- 
View this message in context: 
http://www.nabble.com/query-handling---multiple-languages---multiple-cores-tf4646246.html#a13272209
Sent from the Solr - Dev mailing list archive at Nabble.com.

query handling / multiple languages / multiple cores

Reply via email to