Thanks Shawn for the detailed instructions.

About the router: it is implicit.

About the replicas: I followed the example at
http://wiki.apache.org/solr/SolrCloud

I start the shards with the following (paths and ports simplified):

cd /.../solr/shard1/
/usr/bin/java -Djetty.port=1 -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun=localhost:0 -DnumShards=4 -jar
start.jar > /.../log/shard_1.log

cd /.../solr/shard2/
/usr/bin/java -Djetty.port=2 -DzkHost=localhost:0 -jar start.jar >
/.../log/shard_2.log

and same thing for the two other shards on their own ports.


To post a document (CSV file), I use:

curl http://localhost:shardport/solr/update --data-binary file.csv
-H 'Content-type:text/csv; charset=ISO-8859-1'


I just re-read the example page  at http://wiki.apache.org/solr/SolrCloud
 and I see that there is no difference between starting a shard or a
replicate.  I must be missing something:

>From exampleA (two shards):

cd example2

java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar

Fomr exampleB (two shards with replicates):

cd exampleB

java -Djetty.port=8900 -DzkHost=localhost:9983 -jar start.jar

Thanks.
Thierry










On Mon, Aug 12, 2013 at 5:04 PM, Shawn Heisey <s...@elyograg.org> wrote:

> On 8/12/2013 4:50 PM, Thierry Thelliez wrote:
>
>> Hello,  I am trying to set a four shard system for the first time.  I do
>> not understand why all the shards data are growing at about the same rate
>> when I push the documents to only one shard.
>>
>> The four shards represent four calendar years.  And for now, on a
>> development machine, these four shards run on four different ports.
>>
>> The first shard is started with Zookeeper.
>>
>> The log of the other shards is filed with something like:
>>
>> 7882051 [qtp1154079020-1245] INFO
>> org.apache.solr.update.**processor.LogUpdateProcessor – [collection1]
>> webapp=/solr path=/update params={distrib.from=
>> http://x.y.z.4:50121/solr/**collection1/&update.distrib=**
>> TOLEADER&wt=javabin&version=2<http://x.y.z.4:50121/solr/collection1/&update.distrib=TOLEADER&wt=javabin&version=2>
>> }
>> {add=[14939-96467-304 (1443204912169091072), 14939-96467-308
>> (1443204912179576832), 14939-96467-310 (1443204912185868288),
>> 14939-96467-311 (1443204912192159744), 14939-96467-313
>> (1443204912204742656), 14939-96467-314 (1443204912220471296),
>> 14939-96467-318 (1443204912239345664), 14939-96467-319
>> (1443204912250880000), 14939-96467-322 (1443204912257171456),
>> 14939-96467-324 (1443204912263462912)]} 0 282
>>
>> What is getting written to the other shards? Is a separate index computed
>> on all four shards?  I thought that when pushing a document to one shard,
>> only that shard would update its index.
>>
>
> There are two possibilities.
>
> 1) You don't have four shards, you have four replicas of one shard.  If
> this is happening, then they all will receive all documents.
>
> 2) You are using a router like compositeId instead of implicit.  This will
> calculate the hash of the id field and evenly divide the documents among
> all the shards in the collection according to the hash value.  If you
> create the collection with the implicit router, then documents should be
> indexed by the shard that received them.
>
> To see what router you have, click on Cloud in the admin UI, then click on
> Tree.  Click the arrow to the left of '/collections' to open it. Click on
> collection1 (or whichever you are actually using) -- the actual name, not
> the arrow.  Underneath the table that appears to the right will be "router"
> and its value.
>
> Thanks,
> Shawn
>
>

Reply via email to