Re: New producer: metadata update problem on 2 Node cluster.

2015-05-07 Thread Rahul Jain
Creating a new consumer instance *does not* solve this problem.

Attaching the producer/consumer code that I used for testing.



On Wed, May 6, 2015 at 6:31 AM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 I'm not sure about the old producer behavior in this same failure scenario,
 but creating a new producer instance would resolve the issue since it would
 start with the list of bootstrap nodes and, assuming at least one of them
 was up, it would be able to fetch up to date metadata.

 On Tue, May 5, 2015 at 5:32 PM, Jason Rosenberg j...@squareup.com wrote:

  Can you clarify, is this issue here specific to the new producer?  With
  the old producer, we routinely construct a new producer which makes a
  fresh metadata request (via a VIP connected to all nodes in the cluster).
  Would this approach work with the new producer?
 
  Jason
 
 
  On Tue, May 5, 2015 at 1:12 PM, Rahul Jain rahul...@gmail.com wrote:
 
   Mayuresh,
   I was testing this in a development environment and manually brought
  down a
   node to simulate this. So the dead node never came back up.
  
   My colleague and I were able to consistently see this behaviour several
   times during the testing.
   On 5 May 2015 20:32, Mayuresh Gharat gharatmayures...@gmail.com
  wrote:
  
I agree that to find the least Loaded node the producer should fall
  back
   to
the bootstrap nodes if its not able to connect to any nodes in the
   current
metadata. That should resolve this.
   
Rahul, I suppose the problem went off because the dead node in your
  case
might have came back up and allowed for a metadata update. Can you
   confirm
this?
   
Thanks,
   
Mayuresh
   
On Tue, May 5, 2015 at 5:10 AM, Rahul Jain rahul...@gmail.com
 wrote:
   
 We observed the exact same error. Not very clear about the root
 cause
 although it appears to be related to leastLoadedNode
 implementation.
 Interestingly, the problem went away by increasing the value of
 reconnect.backoff.ms to 1000ms.
 On 29 Apr 2015 00:32, Ewen Cheslack-Postava e...@confluent.io
   wrote:

  Ok, all of that makes sense. The only way to possibly recover
 from
   that
  state is either for K2 to come back up allowing the metadata
  refresh
   to
  eventually succeed or to eventually try some other node in the
   cluster.
  Reusing the bootstrap nodes is one possibility. Another would be
  for
the
  client to get more metadata than is required for the topics it
  needs
   in
  order to ensure it has more nodes to use as options when looking
  for
   a
 node
  to fetch metadata from. I added your description to KAFKA-1843,
although
 it
  might also make sense as a separate bug since fixing it could be
 considered
  incremental progress towards resolving 1843.
 
  On Tue, Apr 28, 2015 at 9:18 AM, Manikumar Reddy 
   ku...@nmsworks.co.in

  wrote:
 
   Hi Ewen,
  
Thanks for the response.  I agree with you, In some case we
  should
use
   bootstrap servers.
  
  
   
If you have logs at debug level, are you seeing this message
 in
 between
   the
connection attempts:
   
Give up sending metadata request since no node is available
   
  
Yes, this log came for couple of times.
  
  
   
Also, if you let it continue running, does it recover after
 the
metadata.max.age.ms timeout?
   
  
It does not reconnect.  It is continuously trying to connect
  with
dead
   node.
  
  
   -Manikumar
  
 
 
 
  --
  Thanks,
  Ewen
 

   
   
   
--
-Regards,
Mayuresh R. Gharat
(862) 250-7125
   
  
 



 --
 Thanks,
 Ewen



Re: New producer: metadata update problem on 2 Node cluster.

2015-05-07 Thread Rahul Jain
Sorry, I meant creating a new producer, not consumer.

Here's the code.

Producer - http://pastebin.com/Kqq1ymCX
Consumer - http://pastebin.com/i2Z8PTYB
Callback - http://pastebin.com/x253z7bG

As you'll notice, I am creating a new producer for each message. So the
bootstrap nodes should be refreshed.

I have a single topic (receive.queue) replicated across 3 nodes. I add all
3 nodes to the bootstrap list. On bringing one of the nodes down, some
messages start failing (metadata update timeout error).

As I mentioned earlier, the problem goes away simply by setting the
reconnect.backoff.ms property to 1000ms.





On 7 May 2015 23:18, Ewen Cheslack-Postava e...@confluent.io wrote:

 Rahul, the mailing list filters attachments, you'd have to post the code
 somewhere else for people to be able to see it.

 But I don't think anyone suggested that creating a new consumer would fix
 anything. Creating a new producer *and discarding the old one* basically
 just makes it start from scratch using the bootstrap nodes, which is why
 that would allow recovery from that condition.

 But that's just a workaround. The real issue is that the producer only
 maintains metadata for the nodes that are replicas for the partitions of
 the topics the producer sends data to. In some cases, this is a small set
 of servers and can get the producer stuck if a node goes offline and it
 doesn't have any other nodes that it can try to communicate with to get
 updated metadata (since the topic partitions should have a new leader).
 Falling back on the original bootstrap servers is one solution to this
 problem. Another would be to maintain metadata for additional servers so
 you always have extra bootstrap nodes in your current metadata set, even
 if they aren't replicas for any of the topics you're working with.

 -Ewen



 On Thu, May 7, 2015 at 12:06 AM, Rahul Jain rahul...@gmail.com wrote:

  Creating a new consumer instance *does not* solve this problem.
 
  Attaching the producer/consumer code that I used for testing.
 
 
 
  On Wed, May 6, 2015 at 6:31 AM, Ewen Cheslack-Postava e...@confluent.io
 
  wrote:
 
  I'm not sure about the old producer behavior in this same failure
  scenario,
  but creating a new producer instance would resolve the issue since it
  would
  start with the list of bootstrap nodes and, assuming at least one of
 them
  was up, it would be able to fetch up to date metadata.
 
  On Tue, May 5, 2015 at 5:32 PM, Jason Rosenberg j...@squareup.com
 wrote:
 
   Can you clarify, is this issue here specific to the new producer?
  With
   the old producer, we routinely construct a new producer which makes
 a
   fresh metadata request (via a VIP connected to all nodes in the
  cluster).
   Would this approach work with the new producer?
  
   Jason
  
  
   On Tue, May 5, 2015 at 1:12 PM, Rahul Jain rahul...@gmail.com
 wrote:
  
Mayuresh,
I was testing this in a development environment and manually brought
   down a
node to simulate this. So the dead node never came back up.
   
My colleague and I were able to consistently see this behaviour
  several
times during the testing.
On 5 May 2015 20:32, Mayuresh Gharat gharatmayures...@gmail.com
   wrote:
   
 I agree that to find the least Loaded node the producer should
 fall
   back
to
 the bootstrap nodes if its not able to connect to any nodes in the
current
 metadata. That should resolve this.

 Rahul, I suppose the problem went off because the dead node in
 your
   case
 might have came back up and allowed for a metadata update. Can you
confirm
 this?

 Thanks,

 Mayuresh

 On Tue, May 5, 2015 at 5:10 AM, Rahul Jain rahul...@gmail.com
  wrote:

  We observed the exact same error. Not very clear about the root
  cause
  although it appears to be related to leastLoadedNode
  implementation.
  Interestingly, the problem went away by increasing the value of
  reconnect.backoff.ms to 1000ms.
  On 29 Apr 2015 00:32, Ewen Cheslack-Postava 
 e...@confluent.io
wrote:
 
   Ok, all of that makes sense. The only way to possibly recover
  from
that
   state is either for K2 to come back up allowing the metadata
   refresh
to
   eventually succeed or to eventually try some other node in the
cluster.
   Reusing the bootstrap nodes is one possibility. Another would
 be
   for
 the
   client to get more metadata than is required for the topics it
   needs
in
   order to ensure it has more nodes to use as options when
 looking
   for
a
  node
   to fetch metadata from. I added your description to
 KAFKA-1843,
 although
  it
   might also make sense as a separate bug since fixing it could
 be
  considered
   incremental progress towards resolving 1843.
  
   On Tue, Apr 28, 2015 at 9:18 AM, Manikumar Reddy 
ku...@nmsworks.co.in
 
   wrote:
  
Hi Ewen,
   
 

RE: Significance of SimpleConsumer id string

2015-05-07 Thread Aditya Auradkar
A client id is used to logically identify an application. Ideally, multiple 
consumers belonging to the same application should use the same client id. 

More concretely, metrics can be gathered per client and quotas in the future 
will be enforced by clientId.

Aditya


From: Magnus Vojbacke [magnus.vojba...@digitalroute.com]
Sent: Thursday, May 07, 2015 4:44 AM
To: users@kafka.apache.org
Subject: Significance of SimpleConsumer id string

Hi,

The kafka.consumer.SimpleConsumer takes an id: String” constructor parameter. 
What is the significance of this id? Are there any consequences or risks 
associated with using the exact same id for several consumers of the same 
topic/partition?

/Magnus



Re: New producer: metadata update problem on 2 Node cluster.

2015-05-07 Thread Ewen Cheslack-Postava
Rahul, the mailing list filters attachments, you'd have to post the code
somewhere else for people to be able to see it.

But I don't think anyone suggested that creating a new consumer would fix
anything. Creating a new producer *and discarding the old one* basically
just makes it start from scratch using the bootstrap nodes, which is why
that would allow recovery from that condition.

But that's just a workaround. The real issue is that the producer only
maintains metadata for the nodes that are replicas for the partitions of
the topics the producer sends data to. In some cases, this is a small set
of servers and can get the producer stuck if a node goes offline and it
doesn't have any other nodes that it can try to communicate with to get
updated metadata (since the topic partitions should have a new leader).
Falling back on the original bootstrap servers is one solution to this
problem. Another would be to maintain metadata for additional servers so
you always have extra bootstrap nodes in your current metadata set, even
if they aren't replicas for any of the topics you're working with.

-Ewen



On Thu, May 7, 2015 at 12:06 AM, Rahul Jain rahul...@gmail.com wrote:

 Creating a new consumer instance *does not* solve this problem.

 Attaching the producer/consumer code that I used for testing.



 On Wed, May 6, 2015 at 6:31 AM, Ewen Cheslack-Postava e...@confluent.io
 wrote:

 I'm not sure about the old producer behavior in this same failure
 scenario,
 but creating a new producer instance would resolve the issue since it
 would
 start with the list of bootstrap nodes and, assuming at least one of them
 was up, it would be able to fetch up to date metadata.

 On Tue, May 5, 2015 at 5:32 PM, Jason Rosenberg j...@squareup.com wrote:

  Can you clarify, is this issue here specific to the new producer?
 With
  the old producer, we routinely construct a new producer which makes a
  fresh metadata request (via a VIP connected to all nodes in the
 cluster).
  Would this approach work with the new producer?
 
  Jason
 
 
  On Tue, May 5, 2015 at 1:12 PM, Rahul Jain rahul...@gmail.com wrote:
 
   Mayuresh,
   I was testing this in a development environment and manually brought
  down a
   node to simulate this. So the dead node never came back up.
  
   My colleague and I were able to consistently see this behaviour
 several
   times during the testing.
   On 5 May 2015 20:32, Mayuresh Gharat gharatmayures...@gmail.com
  wrote:
  
I agree that to find the least Loaded node the producer should fall
  back
   to
the bootstrap nodes if its not able to connect to any nodes in the
   current
metadata. That should resolve this.
   
Rahul, I suppose the problem went off because the dead node in your
  case
might have came back up and allowed for a metadata update. Can you
   confirm
this?
   
Thanks,
   
Mayuresh
   
On Tue, May 5, 2015 at 5:10 AM, Rahul Jain rahul...@gmail.com
 wrote:
   
 We observed the exact same error. Not very clear about the root
 cause
 although it appears to be related to leastLoadedNode
 implementation.
 Interestingly, the problem went away by increasing the value of
 reconnect.backoff.ms to 1000ms.
 On 29 Apr 2015 00:32, Ewen Cheslack-Postava e...@confluent.io
   wrote:

  Ok, all of that makes sense. The only way to possibly recover
 from
   that
  state is either for K2 to come back up allowing the metadata
  refresh
   to
  eventually succeed or to eventually try some other node in the
   cluster.
  Reusing the bootstrap nodes is one possibility. Another would be
  for
the
  client to get more metadata than is required for the topics it
  needs
   in
  order to ensure it has more nodes to use as options when looking
  for
   a
 node
  to fetch metadata from. I added your description to KAFKA-1843,
although
 it
  might also make sense as a separate bug since fixing it could be
 considered
  incremental progress towards resolving 1843.
 
  On Tue, Apr 28, 2015 at 9:18 AM, Manikumar Reddy 
   ku...@nmsworks.co.in

  wrote:
 
   Hi Ewen,
  
Thanks for the response.  I agree with you, In some case we
  should
use
   bootstrap servers.
  
  
   
If you have logs at debug level, are you seeing this
 message in
 between
   the
connection attempts:
   
Give up sending metadata request since no node is available
   
  
Yes, this log came for couple of times.
  
  
   
Also, if you let it continue running, does it recover after
 the
metadata.max.age.ms timeout?
   
  
It does not reconnect.  It is continuously trying to connect
  with
dead
   node.
  
  
   -Manikumar
  
 
 
 
  --
  Thanks,
  Ewen
 

   
   
   
--
-Regards,
Mayuresh R. Gharat
(862) 250-7125
   
 

Auto-rebalance not triggering in 2.10-0.8.1.1

2015-05-07 Thread Stephen Armstrong
I'm running 2.10-0.8.1.1, and rebalance will not trigger on it's own. From
http://grokbase.com/t/kafka/users/14bj5ps9hp/partition-auto-rebalance#20141118rf39q8cs4sjh6vzjgdw92e37cw
I think the leader imbalance means: For a single broker, add up all the
partitions it is leading (Y), and count the ones for which it's not the
preferred broker (X). The ratio of X:Y is the one being used.

I have about 10 topics spread between the 3 brokers, each with 4 or 8
partitions. If I restart broker A, wait 5 min, then restart B, leadership
ends up entirely on C (even though A was in ISR when B went down). Nothing
triggers on it's own. Triggering it manually works (with
bin/kafka-preferred-replica-election.sh).

Is there something I should be checking, or is there a downside to just
adding a cron job to trigger replica election once an hour?

Thanks
Steve


Custom Partition example in SCALA

2015-05-07 Thread Madabhattula Rajesh Kumar
Hi,

Could you please point me the example scala program for Custom Partition

Regards,
Rajesh


Getting NotLeaderForPartitionException in kafka broker

2015-05-07 Thread tao xiao
Hi team,

I have a 12 nodes cluster that has 800 topics and each of which has only 1
partition. I observed that one of the node keeps generating
NotLeaderForPartitionException that causes the node to be unresponsive to
all requests. Below is the exception

[2015-05-07 04:16:01,014] ERROR [ReplicaFetcherThread-1-12], Error for
partition [topic1,0] to broker 12:class
kafka.common.NotLeaderForPartitionException
(kafka.server.ReplicaFetcherThread)

All other nodes in the cluster generate lots of replication error too as
shown below due to unresponsiveness of above node.

[2015-05-07 04:17:34,917] WARN [Replica Manager on Broker 1]: Fetch request
with correlation id 3630911 from client ReplicaFetcherThread-0-1 on
partition [topic1,0] failed due to Leader not local for partition
[cg22_user.item_attr_info.lcr,0] on broker 1 (kafka.server.ReplicaManager)

Any suggestion why the node runs into the unstable stage and any
configuration I can set to prevent this?

I use kafka 0.8.2.1

And here is the server.properties


broker.id=5
port=9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=1048576
socket.receive.buffer.bytes=1048576
socket.request.max.bytes=104857600
log.dirs=/mnt/kafka
num.partitions=1
num.recovery.threads.per.data.dir=1
log.retention.hours=1
log.segment.bytes=1073741824
log.retention.check.interval.ms=30
log.cleaner.enable=false
zookeeper.connect=ip:2181
zookeeper.connection.timeout.ms=6000
unclean.leader.election.enable=false
delete.topic.enable=true
default.replication.factor=3
num.replica.fetchers=3
delete.topic.enable=true
kafka.metrics.reporters=report.KafkaMetricsCollector
straas.hubble.conf.file=/etc/kafka/report.conf




-- 
Regards,
Tao


Significance of SimpleConsumer id string

2015-05-07 Thread Magnus Vojbacke
Hi,

The kafka.consumer.SimpleConsumer takes an id: String” constructor parameter. 
What is the significance of this id? Are there any consequences or risks 
associated with using the exact same id for several consumers of the same 
topic/partition?

/Magnus



Differences between new and legacy scala producer API

2015-05-07 Thread Rendy Bambang Junior
Hi

- Legacy scala api for producer is having keyed message along with topic,
key, partkey, and message. Meanwhile new api has no partkey. Whats the
difference between key and partkey?
- In javadoc, new producer api send method is always async, does
producer.type properties overriden?
- Will scala legacy api be deprecated any time soon?

Rendy


Re: Support https or ssl

2015-05-07 Thread Sriharsha Chintalapani
Hi Jamie,
        I am currently working on providing ssl support for kafka. Here are the 
iras https://issues.apache.org/jira/browse/KAFKA-1690 and 
https://issues.apache.org/jira/browse/KAFKA-1684 . If you are using REST api to 
front kafka producer than you can probably make that http server to be on ssl.
-- 
Harsha


On May 7, 2015 at 7:07:58 PM, Jamie Wang (jamie.w...@actuate.com) wrote:

Hello,  

It's been a while since my team worked on kafka related project. Btw, previous 
project using Kafka worked wonderfully for us. Now I have requirement to use 
https or SSL. I am wondering if the latest version has support for SSL. If not, 
what is the timeline this functionality would supported and if there is any 
suggestion on what I can do in the interim to provide a similar functionality 
using Kakfa. Thank you in advance for your time and help.  

Jamie  


Support https or ssl

2015-05-07 Thread Jamie Wang
Hello, 

It's been a while since my team worked on kafka related project. Btw, previous 
project using Kafka worked wonderfully for us. Now I have requirement to use 
https or SSL. I am wondering if the latest version has support for SSL. If not, 
what is the timeline this functionality would supported and if there is any 
suggestion on what I can do in the interim to provide a similar functionality 
using Kakfa.  Thank you in advance for your time and help.

Jamie


Re: Custom Partition example in SCALA

2015-05-07 Thread Madabhattula Rajesh Kumar
Hi,

I have written below custom partition program in scala but it is not
calling partition method from producer








*class TestRoundRobinPartitioner(props: VerifiableProperties) extends
Partitioner {  def partition(key: Any, numPartitions: Int): Int =   {
println(key + key);  println(partition :  + numPartitions);
Integer.parseInt(key.toString())%4  }}*

In producer class I have configured this class

*props.put(partitioner.class, TestRoundRobinPartitioner)*

Could you please help me to resolve this issue

Regards,
Rajesh


On Thu, May 7, 2015 at 12:38 PM, Madabhattula Rajesh Kumar 
mrajaf...@gmail.com wrote:

 Hi,

 Could you please point me the example scala program for Custom Partition

 Regards,
 Rajesh