Right, NRT is not tied to cloud, but it is tied to the update log.

And you bring up an interesting issue when you talk about avilibility zones.
SolrCloud is fairly "chatty" in that all of the nodes need to talk to all the
other nodes in the network and they will. If the nodes are separated by
an expensive connection (however you measure "expensive", latency
or cost to use or....) then this may well be a bottleneck. For instance,
the leader needs to talk to every one of its followers for an update. Imagine
a leader in zone1 and all 15 replicas in zone2. Now the expensive pipe
will be used 15 times to send the update.

Same for queries, there's an internal software load balancer that sends
queries to one node in each shard with no control over what zone it's
in.

The same argument applies to separate physical data centers FWIW.

We're largely speculating that this may lead to bottlenecks, but it's
something to keep in mind. There are thoughts about making SolrCloud
"rack aware" in a way that will ameliorate this, but nobody has had
time to work on this yet.

We'd _love_ to hear about any real-life experience in this area!

Best
Erick

On Tue, Jun 18, 2013 at 4:37 PM, Rishi Easwaran <rishi.easwa...@aol.com> wrote:
>
> Erick,
>
> We at AOL mail have been using SOLR for quiet a while and our system is 
> pretty write heavy and disk I/O is one of our bottlenecks. At present we use 
> regular SOLR in the lotsOfCore configuration and I am in  the process of 
> benchmarking SOLR cloud for our use case. I don't have concrete data that 
> tLogs are placing lot of load on the system, but for a large scale system 
> like ours even minimal load gets magnified.
>
>
> From the Cloud design, for a properly set up cluster, usually you have 
> replicas at different availability zones . Probablity of losing more than 1 
> availability zone at any given time should be pretty low. Why have tLogs if 
> all replicas on an update get the request anyway, In theory 1 replica must be 
> able to commit eventually.
>
> NRT is an optional feature and probably not tied to Cloud, correct?
>
>
> Thanks,
>
> Rishi.
>
>
>
>
>
>
>
> -----Original Message-----
> From: Erick Erickson <erickerick...@gmail.com>
> To: solr-user <solr-user@lucene.apache.org>
> Sent: Tue, Jun 18, 2013 4:07 pm
> Subject: Re: SOLR Cloud - Disable Transaction Logs
>
>
> bq: the replica can take over and maintain a durable
> state of my index
>
> This is not true. On an update, all the nodes in a slice
> have already written the data to the tlog, not just the
> leader. So if a leader goes down, the replicas have
> enough local info to insure that data is not lost. Without
> tlogs this would not be true since documents are not
> durably saved until a hard commit.
>
> tlogs save data between hard commits. As Yonik
> explained to me once, "soft commits are about
> visibility, hard commits are about durability" and
> tlogs fill up the gap between hard commits.
>
> So to reinforce Shalin's comment yes, you can disable tlogs
> if
> 1> you don't want any of SolrCloud's HA/DR capabilities
> 2> NRT is unimportant
>
> IOW if you're using 4.x just like you would 3.x in terms
> of replication, HA/DR, etc. This is perfectly reasonable,
> but don't get hung up on disabling tlogs.
>
> And you haven't told us _why_ you want to do this. They
> don't consume much memory or disk space unless you
> have configured your hard commits (with openSearcher
> true or false) to be quite long. Do you have any proof at
> all that the tlogs are placing enough load on the system
> to go down this road?
>
> Best
> Erick
>
> On Tue, Jun 18, 2013 at 10:49 AM, Rishi Easwaran <rishi.easwa...@aol.com> 
> wrote:
>> SolrJ already has access to zookeeper cluster state. Network I/O bottleneck
> can be avoided by parallel requests.
>> You are only as slow as your slowest responding server, which could be your
> single leader with the current set up.
>>
>> Wouldn't this lessen the burden of the leader, as he does not have to 
>> maintain
> transaction logs or distribute to replicas?
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Shalin Shekhar Mangar <shalinman...@gmail.com>
>> To: solr-user <solr-user@lucene.apache.org>
>> Sent: Tue, Jun 18, 2013 2:05 am
>> Subject: Re: SOLR Cloud - Disable Transaction Logs
>>
>>
>> Yes, but at what cost? You are thinking of replacing disk IO with even more
>> slower network IO. The transaction log is a append-only log -- it is not
>> pretty cheap especially so if you compare it with the indexing process.
>> Plus your write request/sec will drop a lot once you start doing
>> synchronous replication.
>>
>>
>> On Tue, Jun 18, 2013 at 2:18 AM, Rishi Easwaran 
>> <rishi.easwa...@aol.com>wrote:
>>
>>> Shalin,
>>>
>>> Just some thoughts.
>>>
>>> Near Real time replication- don't we use solrCmdDistributor, which send
>>> requests immediately to replicas with a clonedRequest, as an option can't
>>> we achieve something similar form CloudSolrserver in Solrj instead of
>>> leader doing it. As long as 2 nodes receive writes and acknowledge.
>>> durability should be high.
>>> Peer-Sync and Recovery - Can we achieve that merging indexes from leader
>>> as needed, instead of replaying the transaction logs?
>>>
>>> Rishi.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Shalin Shekhar Mangar <shalinman...@gmail.com>
>>> To: solr-user <solr-user@lucene.apache.org>
>>> Sent: Mon, Jun 17, 2013 3:43 pm
>>> Subject: Re: SOLR Cloud - Disable Transaction Logs
>>>
>>>
>>> It is also necessary for near real-time replication, peer sync and
>>> recovery.
>>>
>>>
>>> On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran <rishi.easwa...@aol.com
>>> >wrote:
>>>
>>> > Hi,
>>> >
>>> > Is there a way to disable transaction logs in SOLR cloud. As far as I can
>>> > tell no.
>>> > Just curious why do we need transaction logs, seems like an I/O intensive
>>> > operation.
>>> > As long as I have replicatonFactor >1, if a node (leader) goes down, the
>>> > replica can take over and maintain a durable state of my index.
>>> >
>>> > I understand from the previous discussions, that it was intended for
>>> > update durability and realtime get.
>>> > But, unless I am missing something an ability to disable it in SOLR cloud
>>> > if not needed would be good.
>>> >
>>> > Thanks,
>>> >
>>> > Rishi.
>>> >
>>> >
>>>
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
>
>

Reply via email to