Minfeng- This issue is tougher as the number of shard you have raise, you
can read Erick Erickson's post:
http://grokbase.com/t/lucene/solr-user/131p75p833/how-distributed-queries-works.
If you have 100M docs I guess you are running this issue.

The common way to deal with this issue is by filtering on a value that
would return fewer results every query, as a creation_date field, and every
query change this field range. For your data import use-case you might want
to generate your data-import.xml with different entities, each one for
another creation_date range. Thus no need for deep paging.

Another option is using
http://wiki.apache.org/solr/CommonQueryParameters#pageDoc_and_pageScore.
Implementing
it in a multi sharded environment, as all your scores=1.0 thus results are
ranked by shard (according to the internal [docId] of each shard), is not
possible of my knowledge.

Caching all the query results in each shard (by raising the
queryResultWindow) should help, wouldn't it?


Best,

Manu


On Mon, Jun 10, 2013 at 8:56 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> SolrEntityProcessor is fine for small amounts of data but not useful for
> such a large index. The problem is that deep paging in search results is
> expensive. As the "start" value for a query increases so does the cost of
> the query. You are much better off just re-indexing the data.
>
>
> On Mon, Jun 10, 2013 at 11:19 PM, Mingfeng Yang <mfy...@wisewindow.com
> >wrote:
>
> > I trying to migrate 100M documents from a solr index (v3.6) to a
> solrcloud
> > index (v4.1, 4 shards) by using SolrEntityProcessor.  My data-config.xml
> is
> > like
> >
> > <dataConfig> <document> <entity name="sep"
> processor="SolrEntityProcessor"
> > url="http://10.64.35.117:8995/solr/"; query="*:*" rows="2000" fl=
> >
> >
> "author_class,authorlink,author_location_text,author_text,author,category,date,dimension,entity,id,language,md5_text,op_dimension,opinion_text,query_id,search_source,sentiment,source_domain_text,source_domain,text,textshingle,title,topic,topic_text,url"
> > /> </document> </dataConfig>
> >
> > Initially, the data import rate is about 1K docs/second, but it
> eventually
> > decrease to 20docs/second after running for tens of hours.
> >
> > Last time I tried data import with solorentityprocessor, the transfer
> rate
> > can be as high as 3K docs/seconds.
> >
> > Anyone has any clues what can cause the slowdown?
> >
> > Thanks,
> > Ming-
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Reply via email to