Re: processing documents in solr

Shawn Heisey Fri, 26 Jul 2013 23:19:42 -0700

On 7/26/2013 11:50 PM, Joe Zhang wrote:
> ==> Essentially we are doing paigination here, right? If performance is not
> the concern, given that the index is dynamic, does the order of
> entries remain stable over time?


Yes, it's pagination.  Just like the other method that I've described in
detail, you'd have to avoid updating the index while you were getting
information.  Unless you can come up with a sort parameter that's
guaranteed to make sure that new documents are at the end, any changes
to the index during the retrieval process will make it impossible to
retrieve every document.

>> ==> This approach seems to require that the id field is numerical, right?
> I have a text-based id that is unique.

StrField types work perfectly with range queries.  As long as it's not a
tokenized field, TextField works properly with range queries too.
KeywordTokenizer is OK, as long you don't use filters that create
additional tokens.  Some examples that create additional tokens are
WordDelimiterFilter and EdgeNgramFilter.

> ==> I'm not sure I understand the "q={XXX TO *}" part --> wouldn't query be
> matched against the default search field, which could be "content", for
> example? How would that do the job?

You are correct, I was too hasty in constructing the query.  That should be:
q=id:{XXX TO *}&rows=NNNNNN&sort=id asc

You could speed things up if you don't need to see all stored fields in
the response by using the fl parameter to only return the fields that
you need.

Responding to your additional message about an autoincrement field -
that would only be possible if you are importing from a data source that
supports autoincrement, like MySQL.  Solr itself has no support for
autoincrement.

Thanks,
Shawn

Re: processing documents in solr

Reply via email to