hi Behrang,
I recommend you to check out some docs that explain how partitions and
replication work (e.g.
https://sookocheff.com/post/kafka/kafka-in-a-nutshell/)

What I'd highlight is that the partition leader and the controller are two
different concepts. Each partition has its own leader and It's the leader
and not the controller that's responsible for dealing with producers and
consumers.

Cheers,
Sandor

On Tue, Feb 20, 2018 at 12:50 PM, Behrang <behran...@gmail.com> wrote:

> Hi Sandor,
>
> Thanks for your reply. I am not at work right now, but I still am a bit
> confused about what happened at work:
>
> 1- One thing that I confirmed was that one the 3 nodes was definitely down.
> We were unable to telnet into its Kafka port from anywhere. The other two
> nodes were up and we could telnet into their Kafka port.
>
> 2- I modified my app a bit and implemented a means for sending
> DescribeCluster requests to the cluster, setting bootrstrap-servers to all
> the 3 nodes. The result indicated that the controller node (
> https://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/admin/
> DescribeClusterResult.html#controller())
> had an id that was not amongst the nodes (
> https://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/admin/
> DescribeClusterResult.html#nodes()).
> It was the same node that was down (i.e. I could telnet into the other
> nodes but not the controller node). And this was always the same, even
> after a few minutes, the controller node's id was still the same.
>
> 3- Despite that, when running my app from my machine, I could get records
> from the topics I had subscribed to, but from another machine, no records
> were getting sent to the app. The app running on the other machine had a
> different consumer groups though.
>
> 4- The cluster had three nodes and when the controller node was done, most
> of the time I was getting a message like this: *"Connection to node -N
> could not be established. Broker may not be available."* where N was either
> -1, -2, or -3 but at one point in my app's logs I found a handful of
> entries in which N was a very large number (e.g. 2156987456).
>
> I assume our cluster was misbehaving, but still can't explain why my app
> was working like this.
>
>
> Best regards,
> Behrang Saeedzadeh
>
> On 20 February 2018 at 19:22, Sandor Murakozi <smurak...@gmail.com> wrote:
>
> > Hi Behrang,
> >
> > All reads and writes of a partition go through the leader of that
> > partition.
> > If the leader of a partition is down you will not be able to
> > produce/consume data in it until a new leader is elected. Typically it
> > happens in a few seconds, after that you should be able to use that
> > partition again. If your problem persists I recommend figuring out why
> > leader election does not happen.
> > You might be able to work with other partitions, at least those that have
> > leaders on brokers that are up.
> >
> > Cheers,
> > Sandor Murakozi
> >
> > On Tue, Feb 20, 2018 at 9:00 AM, Behrang <behran...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I have a Kafka cluster with 3 nodes.
> > >
> > > I pass the nodes in the cluster to a consumer app I am building as
> > > bootstrap servers.
> > >
> > > When one of the nodes in the cluster is down, the consumer group
> > sometimes
> > > CAN read records from the server but sometimes CAN NOT.
> > >
> > > In both cases, the same Kafka node is down.
> > >
> > > Is this behavior normal? Isn't it enough to only have one of the nodes
> in
> > > the Kafka cluster be up and running? I have not delved much into setup
> > and
> > > administration of Kafka clusters, but I thought Kafka uses the nodes
> for
> > HA
> > > and as long as one node is up and running, the cluster remains healthy
> > and
> > > working.
> > >
> > > Best regards,
> > > Behrang Saeedzadeh
> > >
> >
>

Reply via email to