Re: Data replication and zero data loss

2015-05-02 Thread xiao li
Hi, Joong,

Please check the following two links:

-
https://cwiki.apache.org/confluence/display/KAFKA/KIP-3+-+Mirror+Maker+Enhancement

-
https://cwiki.apache.org/confluence/display/KAFKA/KIP-8+-+Add+a+flush+method+to+the+producer+API

They might help you understand the problem.

Cheers,

Xiao Li

2015-05-01 6:28 GMT-07:00 Joe Stein joe.st...@stealth.ly:

 If you want 0 data loss you should also look into the min.insync.repica
 setting in 0.8.2.1 as it guarantees data in multiple racks.

 If you don't have that set then you have this scenario as possible.

 lets say 1 topic, 1 partition, replication 3. You are producing with ACK=-1

 b1, b2, b3 (where b=broker and b1 is leader, b2, b3 replicas).

 b1,b2 dies, b3 is leader. so far all is well.

 10 minutes go by and b3 dies

 1 minute later b1 comes back online, it will truncate essentially 45
 minutes of data upstream thought was saved.

 but now, you can have ACK=-1 get a failure if you don't have a enough
 replica to survive data loss guarantees. min.isr=2 min.sir=3 //depends on
 data

 Also take a look at
 https://github.com/stealthly/go_kafka_client/tree/master/mirrormaker it
 might be helpful for what you are looking for.

 ~ Joe Stein
 - - - - - - - - - - - - - - - - -

   http://www.stealth.ly
 - - - - - - - - - - - - - - - - -

 On Fri, May 1, 2015 at 7:43 AM, Joong Lee jo...@me.com wrote:

  It is based on our understanding from reading the documents.
 
  We aren't concerned of data duplication as that is going to be handled by
  elasticsearch.
 
   On May 1, 2015, at 12:15 AM, Daniel Compton 
  daniel.compton.li...@gmail.com wrote:
  
   When we evaluated MirrorMaker last year we didn't find any risk of data
   loss, only duplicate messages in the case of a network partition.
  
   Did you discover data loss in your tests, or were you just looking at
 the
   docs?
   On Fri, 1 May 2015 at 4:31 pm Jiangjie Qin j...@linkedin.com.invalid
   wrote:
  
   Which mirror maker version did you look at? The MirrorMaker in trunk
   should not have data loss if you just use the default setting.
  
   On 4/30/15, 7:53 PM, Joong Lee jo...@me.com wrote:
  
   Hi,
   We are exploring Kafka to keep two data centers (primary and DR)
  running
   hosts of elastic search nodes in sync. One key requirement is that we
   can't lose any data. We POC'd use of MirrorMaker and felt it may not
  meet
   out data loss requirement.
  
   I would like ask the community if we should look for another solution
  or
   would Kafka be the right solution considering zero data loss
  requirement.
  
   Thanks
  
  
 



Re: Data replication and zero data loss

2015-05-01 Thread Joong Lee
It is based on our understanding from reading the documents. 

We aren't concerned of data duplication as that is going to be handled by 
elasticsearch. 

 On May 1, 2015, at 12:15 AM, Daniel Compton daniel.compton.li...@gmail.com 
 wrote:
 
 When we evaluated MirrorMaker last year we didn't find any risk of data
 loss, only duplicate messages in the case of a network partition.
 
 Did you discover data loss in your tests, or were you just looking at the
 docs?
 On Fri, 1 May 2015 at 4:31 pm Jiangjie Qin j...@linkedin.com.invalid
 wrote:
 
 Which mirror maker version did you look at? The MirrorMaker in trunk
 should not have data loss if you just use the default setting.
 
 On 4/30/15, 7:53 PM, Joong Lee jo...@me.com wrote:
 
 Hi,
 We are exploring Kafka to keep two data centers (primary and DR) running
 hosts of elastic search nodes in sync. One key requirement is that we
 can't lose any data. We POC'd use of MirrorMaker and felt it may not meet
 out data loss requirement.
 
 I would like ask the community if we should look for another solution or
 would Kafka be the right solution considering zero data loss requirement.
 
 Thanks
 
 


Re: Data replication and zero data loss

2015-05-01 Thread Joong Lee
0.8.2.1

 On Apr 30, 2015, at 11:28 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote:
 
 Which mirror maker version did you look at? The MirrorMaker in trunk
 should not have data loss if you just use the default setting.
 
 On 4/30/15, 7:53 PM, Joong Lee jo...@me.com wrote:
 
 Hi,
 We are exploring Kafka to keep two data centers (primary and DR) running
 hosts of elastic search nodes in sync. One key requirement is that we
 can't lose any data. We POC'd use of MirrorMaker and felt it may not meet
 out data loss requirement.
 
 I would like ask the community if we should look for another solution or
 would Kafka be the right solution considering zero data loss requirement.
 
 Thanks
 


Re: Data replication and zero data loss

2015-05-01 Thread Joe Stein
If you want 0 data loss you should also look into the min.insync.repica
setting in 0.8.2.1 as it guarantees data in multiple racks.

If you don't have that set then you have this scenario as possible.

lets say 1 topic, 1 partition, replication 3. You are producing with ACK=-1

b1, b2, b3 (where b=broker and b1 is leader, b2, b3 replicas).

b1,b2 dies, b3 is leader. so far all is well.

10 minutes go by and b3 dies

1 minute later b1 comes back online, it will truncate essentially 45
minutes of data upstream thought was saved.

but now, you can have ACK=-1 get a failure if you don't have a enough
replica to survive data loss guarantees. min.isr=2 min.sir=3 //depends on
data

Also take a look at
https://github.com/stealthly/go_kafka_client/tree/master/mirrormaker it
might be helpful for what you are looking for.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, May 1, 2015 at 7:43 AM, Joong Lee jo...@me.com wrote:

 It is based on our understanding from reading the documents.

 We aren't concerned of data duplication as that is going to be handled by
 elasticsearch.

  On May 1, 2015, at 12:15 AM, Daniel Compton 
 daniel.compton.li...@gmail.com wrote:
 
  When we evaluated MirrorMaker last year we didn't find any risk of data
  loss, only duplicate messages in the case of a network partition.
 
  Did you discover data loss in your tests, or were you just looking at the
  docs?
  On Fri, 1 May 2015 at 4:31 pm Jiangjie Qin j...@linkedin.com.invalid
  wrote:
 
  Which mirror maker version did you look at? The MirrorMaker in trunk
  should not have data loss if you just use the default setting.
 
  On 4/30/15, 7:53 PM, Joong Lee jo...@me.com wrote:
 
  Hi,
  We are exploring Kafka to keep two data centers (primary and DR)
 running
  hosts of elastic search nodes in sync. One key requirement is that we
  can't lose any data. We POC'd use of MirrorMaker and felt it may not
 meet
  out data loss requirement.
 
  I would like ask the community if we should look for another solution
 or
  would Kafka be the right solution considering zero data loss
 requirement.
 
  Thanks
 
 



Re: Data replication and zero data loss

2015-04-30 Thread Jiangjie Qin
Which mirror maker version did you look at? The MirrorMaker in trunk
should not have data loss if you just use the default setting.

On 4/30/15, 7:53 PM, Joong Lee jo...@me.com wrote:

Hi,
We are exploring Kafka to keep two data centers (primary and DR) running
hosts of elastic search nodes in sync. One key requirement is that we
can't lose any data. We POC'd use of MirrorMaker and felt it may not meet
out data loss requirement.

I would like ask the community if we should look for another solution or
would Kafka be the right solution considering zero data loss requirement.

Thanks



Re: Data replication and zero data loss

2015-04-30 Thread Daniel Compton
When we evaluated MirrorMaker last year we didn't find any risk of data
loss, only duplicate messages in the case of a network partition.

Did you discover data loss in your tests, or were you just looking at the
docs?
On Fri, 1 May 2015 at 4:31 pm Jiangjie Qin j...@linkedin.com.invalid
wrote:

 Which mirror maker version did you look at? The MirrorMaker in trunk
 should not have data loss if you just use the default setting.

 On 4/30/15, 7:53 PM, Joong Lee jo...@me.com wrote:

 Hi,
 We are exploring Kafka to keep two data centers (primary and DR) running
 hosts of elastic search nodes in sync. One key requirement is that we
 can't lose any data. We POC'd use of MirrorMaker and felt it may not meet
 out data loss requirement.
 
 I would like ask the community if we should look for another solution or
 would Kafka be the right solution considering zero data loss requirement.
 
 Thanks




Data replication and zero data loss

2015-04-30 Thread Joong Lee
Hi,
We are exploring Kafka to keep two data centers (primary and DR) running hosts 
of elastic search nodes in sync. One key requirement is that we can't lose any 
data. We POC'd use of MirrorMaker and felt it may not meet out data loss 
requirement. 

I would like ask the community if we should look for another solution or would 
Kafka be the right solution considering zero data loss requirement. 

Thanks