SolrCloud OR distributed Solr

2014-03-30 Thread Priti Solanki
Hello Member,

Is there any difference between distributed solr  solrCloud ?

Consider I have three countries' product. I have indexed one country data
and it's index size is 160 gb+

Now we have other two countries and now I am confused !

My client ask me what is the difference if we procure another Solr server
and indexed separatelyI was thinking for solrcloud.Can someone explain
how we can explain these two approaches in simple words and if there are
any reading links please share.

Thanks


Re: SolrCloud OR distributed Solr

2014-03-30 Thread Gora Mohanty
On 30 March 2014 23:12, Priti Solanki pritiatw...@gmail.com wrote:

 Hello Member,

 Is there any difference between distributed solr  solrCloud ?

You might be confusing the older Solr distributed search with the new SolrCloud:
* Older distributed search: https://wiki.apache.org/solr/DistributedSearch
* SolrCloud: https://cwiki.apache.org/confluence/display/solr/SolrCloud

 Consider I have three countries' product. I have indexed one country data
 and it's index size is 160 gb+

 Now we have other two countries and now I am confused !

 My client ask me what is the difference if we procure another Solr server
 and indexed separatelyI was thinking for solrcloud.Can someone explain
 how we can explain these two approaches in simple words and if there are
 any reading links please share.

With 4.0+ versions of Solr, you probably want to go for SolrCloud.

Regards,
Gora


Re: SolrCloud OR distributed Solr

2014-03-30 Thread Erick Erickson
Distributed solr is simply the ability for Solr to take the incoming
query and send it to multiple shards, then aggregate the response.
Here a shard is a physical partition of a single logical index. The
assumption is that you can't fit the entire index on a single machine
and still get the performance you need, so you use N smaller parts.

So, there has to be some mechanism to send the request to each
sub-index and assemble the response and give it back to the client.
That's distrubuted solr.

Before 4.0, splitting the index up was entirely manual, _you_ decided
what document went to what shard. _you_ configured Solr to know
about where the other shards were. _you_ handled the situation where a
node went down and you had to heal the network. But it was still
using distributed search


As of 4.0, SolrCloud happens. The differences are
1 you can have Solr automatically distribute the docs to the right shard.
2 when a node goes down, Solr can automatically compensate (assuming
more than one replica/shard)
3 when the node comes back up, Solr will automatically re-synchronize
the node before (automatically) bringing it back into service

NOTE: you can still use old-style manual sharding if you choose, it's
available in 4.x

But be careful here and draw a distinction between distributed
search and federated search.
Distributed search - what we've been talking about, the underlying
assumption is that the sub-indexes are all substantially similar.

Federated search - the sub-indexes (or, indeed, complete
self-contained indexes) may have no relation to each other and you're
somehow expected to search them all and return the results. In this
case you'll probably be firing off N separate queries (one to each of
N indexes) and assembling them at the app layer.

Best,
Erick

On Sun, Mar 30, 2014 at 1:42 PM, Priti Solanki pritiatw...@gmail.com wrote:
 Hello Member,

 Is there any difference between distributed solr  solrCloud ?

 Consider I have three countries' product. I have indexed one country data
 and it's index size is 160 gb+

 Now we have other two countries and now I am confused !

 My client ask me what is the difference if we procure another Solr server
 and indexed separatelyI was thinking for solrcloud.Can someone explain
 how we can explain these two approaches in simple words and if there are
 any reading links please share.

 Thanks


SolrCloud vs Distributed Solr

2013-07-08 Thread Flavio Pompermaier
Hi to all,

I started  following this mailing list about 1 month ago and I read many
threads about SolrCloud and distributed Solr. I just want to check if I
understood correctly and, if so, ask for some architectural decision I have
to take:

1) At the moment, in order to design a scalable Solr deployment, one could
choose if to setup a Solr cloud (where servers are transparent to the
client) or a simil-Solr cloud (distributed mode) where client has to know
which server to contact, right?

2) If so, I don't fully understand why to make the clients aware about the
Solr servers. Why should a client decide on its own where to index or
query? Is it because of backward compatibility, performance or similar
issues? From what I understood SolrCloud does all the magic hiding to the
user the real deployment (with all subsequent benefits of tearing up/down
server and so on)...isn't it?

3) When configuring SolrCloud I put in the solrconfig.xml the list of the
shards supporting my collection distribution. E.g.:

   str name=selflocalhost:8983/solr/str
arr name=shards
  strlocalhost:8983/solr/str
  strsomeotherhost:7574/solr/str
/arr

How does the splitting work behind the scenes (a link to a detailed
explanation is sufficient..)?

4) If one day I decide to add one more server to distribute the load, what
is the correct procedure to deploy such a change? Does SolrCloud
automatically redistribute the index within all shards?


Best,
Flavio


Re: SolrCloud vs Distributed Solr

2013-07-08 Thread Erick Erickson
Flavio:

I think you're missing a critical bit about SolrCloud,
namely Zookeeper (ZK), see here on the SolrCloud page
for a start:
http://wiki.apache.org/solr/SolrCloud#ZooKeeper

You'll notice that each Solr node, when it is started,
requires the address of your ZK ensemble, NOT a
solr node. That allows ZK to know where all the
nodes are in your cluster.

So each of the nodes just knows where all the other
shards are since that info is kept it ZK, so any request
to any node in the cluster does the right thing, whether
update or query. So updates are forwarded to all
correct leaders, queries are sent to a member of
each shard etc, all automatically.

Now take a look at CloudSolrServer (assuming that
you're using SolrJ from your client). The constructor
takes the address of ZK too. Using this info the client
code has access to information about the state of the
entire cluster, so you don't have to do anything, the
client code will just know how to connect to Solr.

So for 1, 2 and 3 above, don't do anything G. Just
start up all the solr nodes with the proper
zkHost (or zkRun) parameter and send requests
to any node. You do NOT have to configure shards
in solrconfig.xml or anything else.


For 4, I'm going to pass on the shard splitting details
since I haven't had time to dive into that yet. But increasing
capacity comes in two flavors. If you simply need to
get more query throughput, just add more nodes. Solr
will assign them to the right shard (although you can
control this), copy the index for that shard down and
start automatically routing new requests to that node too.

The second flavor is when your index is too big to fit
on your physical hardware and you need more shards (as
opposed to more replicas). Then you need to do the shard
splitting thing which I'm going to skip rather than
mislead you.

Final note: The other thing  that's confusing you I think is
the distinction between SolrCloud and Solr Master/Slave.
SolrCloud is the new way of doing things. Master/Slave
is a situation in where all the automatic stuff you can do with
SolrCloud must be done manually, things like assigning
documents to particular shards, configuring solrconfig.xml
with the addresses of all the other shards, all that stuff.

Best
Erick


Re: SolrCloud vs Distributed Solr

2013-07-08 Thread Flavio Pompermaier
Thanks for the detailed response Erik, you helped me a lot in clarifying
many Solr concepts!

Best,
Flavio


On Mon, Jul 8, 2013 at 1:59 PM, Erick Erickson erickerick...@gmail.comwrote:

 Flavio:

 I think you're missing a critical bit about SolrCloud,
 namely Zookeeper (ZK), see here on the SolrCloud page
 for a start:
 http://wiki.apache.org/solr/SolrCloud#ZooKeeper

 You'll notice that each Solr node, when it is started,
 requires the address of your ZK ensemble, NOT a
 solr node. That allows ZK to know where all the
 nodes are in your cluster.

 So each of the nodes just knows where all the other
 shards are since that info is kept it ZK, so any request
 to any node in the cluster does the right thing, whether
 update or query. So updates are forwarded to all
 correct leaders, queries are sent to a member of
 each shard etc, all automatically.

 Now take a look at CloudSolrServer (assuming that
 you're using SolrJ from your client). The constructor
 takes the address of ZK too. Using this info the client
 code has access to information about the state of the
 entire cluster, so you don't have to do anything, the
 client code will just know how to connect to Solr.

 So for 1, 2 and 3 above, don't do anything G. Just
 start up all the solr nodes with the proper
 zkHost (or zkRun) parameter and send requests
 to any node. You do NOT have to configure shards
 in solrconfig.xml or anything else.


 For 4, I'm going to pass on the shard splitting details
 since I haven't had time to dive into that yet. But increasing
 capacity comes in two flavors. If you simply need to
 get more query throughput, just add more nodes. Solr
 will assign them to the right shard (although you can
 control this), copy the index for that shard down and
 start automatically routing new requests to that node too.

 The second flavor is when your index is too big to fit
 on your physical hardware and you need more shards (as
 opposed to more replicas). Then you need to do the shard
 splitting thing which I'm going to skip rather than
 mislead you.

 Final note: The other thing  that's confusing you I think is
 the distinction between SolrCloud and Solr Master/Slave.
 SolrCloud is the new way of doing things. Master/Slave
 is a situation in where all the automatic stuff you can do with
 SolrCloud must be done manually, things like assigning
 documents to particular shards, configuring solrconfig.xml
 with the addresses of all the other shards, all that stuff.

 Best
 Erick




-- 

Flavio Pompermaier
*Development Department
*___
*OKKAM**Srl **- www.okkam.it*

*Phone:* +(39) 0461 283 702
*Fax:* + (39) 0461 186 6433
*Email:* f.pomperma...@okkam.it
*Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
*Registered office:* Trento (Italy), via Segantini 23

Confidentially notice. This e-mail transmission may contain legally
privileged and/or confidential information. Please do not read it if you
are not the intended recipient(S). Any use, distribution, reproduction or
disclosure by any other person is strictly prohibited. If you have received
this e-mail in error, please notify the sender and destroy the original
transmission and its attachments without reading or saving it in any manner.