Erick,

We at AOL mail have been using SOLR for quiet a while and our system is pretty 
write heavy and disk I/O is one of our bottlenecks. At present we use regular 
SOLR in the lotsOfCore configuration and I am in  the process of benchmarking 
SOLR cloud for our use case. I don't have concrete data that tLogs are placing 
lot of load on the system, but for a large scale system like ours even minimal 
load gets magnified. 


>From the Cloud design, for a properly set up cluster, usually you have 
>replicas at different availability zones . Probablity of losing more than 1 
>availability zone at any given time should be pretty low. Why have tLogs if 
>all replicas on an update get the request anyway, In theory 1 replica must be 
>able to commit eventually.

NRT is an optional feature and probably not tied to Cloud, correct?


Thanks,

Rishi.



 

 

-----Original Message-----
From: Erick Erickson <erickerick...@gmail.com>
To: solr-user <solr-user@lucene.apache.org>
Sent: Tue, Jun 18, 2013 4:07 pm
Subject: Re: SOLR Cloud - Disable Transaction Logs


bq: the replica can take over and maintain a durable
state of my index

This is not true. On an update, all the nodes in a slice
have already written the data to the tlog, not just the
leader. So if a leader goes down, the replicas have
enough local info to insure that data is not lost. Without
tlogs this would not be true since documents are not
durably saved until a hard commit.

tlogs save data between hard commits. As Yonik
explained to me once, "soft commits are about
visibility, hard commits are about durability" and
tlogs fill up the gap between hard commits.

So to reinforce Shalin's comment yes, you can disable tlogs
if
1> you don't want any of SolrCloud's HA/DR capabilities
2> NRT is unimportant

IOW if you're using 4.x just like you would 3.x in terms
of replication, HA/DR, etc. This is perfectly reasonable,
but don't get hung up on disabling tlogs.

And you haven't told us _why_ you want to do this. They
don't consume much memory or disk space unless you
have configured your hard commits (with openSearcher
true or false) to be quite long. Do you have any proof at
all that the tlogs are placing enough load on the system
to go down this road?

Best
Erick

On Tue, Jun 18, 2013 at 10:49 AM, Rishi Easwaran <rishi.easwa...@aol.com> wrote:
> SolrJ already has access to zookeeper cluster state. Network I/O bottleneck 
can be avoided by parallel requests.
> You are only as slow as your slowest responding server, which could be your 
single leader with the current set up.
>
> Wouldn't this lessen the burden of the leader, as he does not have to 
> maintain 
transaction logs or distribute to replicas?
>
>
>
>
>
>
>
> -----Original Message-----
> From: Shalin Shekhar Mangar <shalinman...@gmail.com>
> To: solr-user <solr-user@lucene.apache.org>
> Sent: Tue, Jun 18, 2013 2:05 am
> Subject: Re: SOLR Cloud - Disable Transaction Logs
>
>
> Yes, but at what cost? You are thinking of replacing disk IO with even more
> slower network IO. The transaction log is a append-only log -- it is not
> pretty cheap especially so if you compare it with the indexing process.
> Plus your write request/sec will drop a lot once you start doing
> synchronous replication.
>
>
> On Tue, Jun 18, 2013 at 2:18 AM, Rishi Easwaran <rishi.easwa...@aol.com>wrote:
>
>> Shalin,
>>
>> Just some thoughts.
>>
>> Near Real time replication- don't we use solrCmdDistributor, which send
>> requests immediately to replicas with a clonedRequest, as an option can't
>> we achieve something similar form CloudSolrserver in Solrj instead of
>> leader doing it. As long as 2 nodes receive writes and acknowledge.
>> durability should be high.
>> Peer-Sync and Recovery - Can we achieve that merging indexes from leader
>> as needed, instead of replaying the transaction logs?
>>
>> Rishi.
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Shalin Shekhar Mangar <shalinman...@gmail.com>
>> To: solr-user <solr-user@lucene.apache.org>
>> Sent: Mon, Jun 17, 2013 3:43 pm
>> Subject: Re: SOLR Cloud - Disable Transaction Logs
>>
>>
>> It is also necessary for near real-time replication, peer sync and
>> recovery.
>>
>>
>> On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran <rishi.easwa...@aol.com
>> >wrote:
>>
>> > Hi,
>> >
>> > Is there a way to disable transaction logs in SOLR cloud. As far as I can
>> > tell no.
>> > Just curious why do we need transaction logs, seems like an I/O intensive
>> > operation.
>> > As long as I have replicatonFactor >1, if a node (leader) goes down, the
>> > replica can take over and maintain a durable state of my index.
>> >
>> > I understand from the previous discussions, that it was intended for
>> > update durability and realtime get.
>> > But, unless I am missing something an ability to disable it in SOLR cloud
>> > if not needed would be good.
>> >
>> > Thanks,
>> >
>> > Rishi.
>> >
>> >
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
>

 

Reply via email to