Re: Kafka Consumer not consuming new messages randomly

Ewen Cheslack-Postava Sat, 26 Nov 2016 17:51:55 -0800

The REST proxy cannot guarantee that if there are messages in Kafka it will
definitely return them. There will always be some latency between the
request to the REST proxy and fetching data from Kafka, and because of the
way the Kafka protocol works this could be delayed by the fetch timeout.
The reason you're seeing a 200 on the first request is that it *is*
successful, it just doesn't have any data to return to you yet. That
doesn't mean none is there, just that it hasn't finished fetching it yet.


I don't think what you're trying to do is easily supported by the REST
proxy (either today or in the 1.0 version). For it to work well you'd need
to be able to create the consumer, look up the last offset when the
consumer started, consume up to that point, and then shut down. Currently
you're using an empty response as an indicator that there is no more data
available, but that's not actually a reliable indicator.

We're planning to add support for the new consumer to the REST proxy in a
future version, which may come with support for looking up the latest
offset (or at least seeking there and looking up the current offset).
However, this will require a newer version of Kafka (and CP) as this
functionality is only available in newer version of the Kafka consumer.

-Ewen

On Mon, Nov 21, 2016 at 1:39 AM, marcel bichon <marcelbichon.ka...@gmail.com
> wrote:

> Hello !
>
> I contact you because I'm have a problem with my architecture using kafka.
>
> I'm using Confluent 1.0 with kafka 0.8.2.
> I have the following architecture :
> - A job who calls every hour the kafka REST API behind a load-balancer.
> - The Kafka REST API servers points on a kafka cluster of three brokers.
> - The zookeeper servers are located on kafka brokers.
>
> This problem concerns one topic with only one partition.
> It is consumed by only one consumer group.
>
> The process is the following :
> 0. There is a process running very often which produces new messages in the
> topic.
> 1. Each hour, the job create a consumer instance for the consumer group
> (POST request) through the load balancer. Then it gets the REST node where
> the consumer instance was created.
> 2. Then, the job consume the messages through a GET request with this
> consumer instance on the right REST node.
> 3. Once it is done, it commits the offset
> 4. And finally, once everything done, it deletes the consumer instance
>
> I noticed the following problem :
> One hour the job will work and get new messages.
> And then during several hours, the command to retrieve the new messages
> will get nothing new, even if there is new messages in the topic, and even
> if the offset of the last consumed message is not the last.
>
> Technically speaking, the REST query returns a 200 code with a body empty,
> when the problem occured.
>
> I checked in zookeeper, all three zookeepers are well synchronized, and the
> offset is the same on all three zookeeper servers for this consumergroup
> for this topic.
> I checked in kafka and it says the replicas are well insynced.
>
> I was wonderinng what could explain this behaviour ?
>
> I checked the kafka logs and see nothing which could explain the errors
> unfortunately.
> Any idea where I could look to find the explanations of this behaviour ?
>
> Best regards.
>
> M.
>



-- 
Thanks,
Ewen

Re: Kafka Consumer not consuming new messages randomly

Reply via email to