On 02/09/2015 15:47, scott chu wrote:
> I post a question on Stackoverflow 
> http://stackoverflow.com/questions/32343813/custom-sharding-or-auto-sharding-on-solrcloud:
> However, since this is a mail-list, I repost the question below to request 
> for suggestion and more subtle concept of SolrCloud's behavior on document 
> routing.
> I want to establish a SolrCloud clsuter for over 10 millions of news 
> articles. After reading this article in Apache Solr Refernce guide: Shards 
> and Indexing Data in SolrCloud, I have a plan as follows:
> Add prefix ED2001! to document ID where ED means some newspaper source and 
> 2001 is the year part in published date of news article, i.e. I want to put 
> all news articles of specific news paper source published in specific year to 
> a shard.
> Create collection with router.name set to compositeID.
> Add documents?
> Query Collection?
> Practically, I got some questions:
> How to add doucments based on this plan? Do I have to specify special 
> parameters when updating the collection/core?
> Is this called "custom sharding"? If not, what is "custom sharding"?
> Is auto sharding a better choice for my case since there's a shard-splitting 
> feature for auto sharding when the shard is too big?
> Can I query without _router_ parameter?
> EDIT @ 2015/9/2:
> This is how I think SolrCloud will do: "The amount of news articles of 
> specific newspaper source of specific year tends to be around a fix number, 
> e.g. Every year ED has around 80,000 articles, so each shard's size won't 
> increase dramatically. For the next year's news articles of ED, I only have 
> to add prefix 'ED2016!' to document ID, SolrCloud will create a new shard for 
> me (which contains all ED2016 articles), and later the Leader will spread the 
> replica of this new shard to other nodes (per replica per node other than 
> leader?)". Am I right? If yes, it seems no need for shard-splitting.
> 

Think about your query pattern when you decide how to shard. If most of
your queries are for recent articles, then some shards will be loaded
far more than others. Here's a rather old blog post we wrote on the
subject (actually based on Xapian, another open source search engine,
but the concepts are the same for Solr):
http://www.flax.co.uk/blog/2009/04/

Cheers

Charlie

-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Reply via email to