Thanks for the detailed explanation. I was simply testing Kafka for the first 
time with a few "throw away" unit tests to learn it works and was curious why I 
was receiving that behavior.
________________________________________
From: Jay Kreps [jay.kr...@gmail.com]
Sent: Wednesday, May 20, 2015 10:29 AM
To: users@kafka.apache.org
Subject: Re: consumer poll returns no records unless called more than once, why?

Hey Ben,

The consumer actually doesn't promise to return records on any given poll()
call and even in trunk it won't return records on the first call likely.

Internally the reason is that it basically does one or two rounds of
non-blocking actions and then returns. This could include things like
communicating with the co-ordinator, establishing connections, sending
fetch requests, etc.

I guess the question is whether this behavior is confusing or not. In
general there is no guarantee that you will have data ready, or that if you
do you will be assigned a partition to consume from within your timeout. So
assuming that poll will always return data is wrong.

However with a little effort we could potentially wrap the poll call so
that rather than return it would always attempt to wait the full timeout
potentially doing multiple internal polls. This doesn't guarantee it would
return data but would reduce the likelihood when data was ready.

I'm not sure if that is actually a good idea vs just documenting this a
little better in the javadoc.

-Jay

On Wed, May 20, 2015 at 10:12 AM, Guozhang Wang <wangg...@gmail.com> wrote:

> Hello Ben,
>
> This Java consumer client was still not mature in 0.8.2.0 and lots of bug
> fixes have been checked in since then.
>
> I just test your code with trunk's consumer and it does not illustrate this
> problem. Could you try the same on your side and see if this issue goes
> away?
>
> Guozhang
>
> On Wed, May 20, 2015 at 9:49 AM, Padgett, Ben <bpadg...@illumina.com>
> wrote:
>
> > I am using Kafka v0.8.2.0
> > ________________________________________
> > From: Guozhang Wang [wangg...@gmail.com]
> > Sent: Wednesday, May 20, 2015 9:41 AM
> > To: users@kafka.apache.org
> > Subject: Re: consumer poll returns no records unless called more than
> > once, why?
> >
> > Hello Ben,
> >
> > Which version of Kafka are you using with this consumer client?
> >
> > Guozhang
> >
> > On Wed, May 20, 2015 at 9:03 AM, Padgett, Ben <bpadg...@illumina.com>
> > wrote:
> >
> > >         //this code
> > >
> > >         Properties consumerProps = new Properties();
> > >         consumerProps.put("bootstrap.servers", "localhost:9092");
> > >
> > >
> > >         //without deserializer it fails, which makes sense. the
> > > documentation however doesn't show this
> > >         consumerProps.put("key.deserializer",
> > > "org.apache.kafka.common.serialization.StringDeserializer");
> > >         consumerProps.put("value.deserializer",
> > > "org.apache.kafka.common.serialization.StringDeserializer");
> > >
> > >
> > >         //why is serializer required? without this it fails to return
> > > results when calling poll
> > >         consumerProps.put("key.serializer",
> > > "org.apache.kafka.common.serializers.StringSerializer");
> > >         consumerProps.put("value.serializer",
> > > "org.apache.kafka.common.serializers.StringSerializer");
> > >
> > >
> > >         consumerProps.put("group.id", "test");
> > >         consumerProps.put("enable.auto.commit", "true");
> > >         consumerProps.put("auto.commit.interval.ms", "1000");
> > >         consumerProps.put("session.timeout.ms", "30000");
> > >
> > >         org.apache.kafka.clients.consumer.KafkaConsumer<String, String>
> > > consumer = new org.apache.kafka.clients.consumer.KafkaConsumer<String,
> > > String>(consumerProps);
> > >
> > >         TopicPartition topicPartition = new
> > > TopicPartition("project-created", 0);
> > >         consumer.subscribe(topicPartition);
> > >
> > >         consumer.seeekToBeginning(topicPartition);
> > >
> > >         //each scenerio code goes here
> > >
> > >
> > >
> > >         I have a scenerio where it returns records and a scenerio where
> > no
> > > records are returned. Could anyone provide insight on why this happens?
> > >
> > >
> > >
> > >         //without a loop consumer.poll(100) returns no records
> > >         //after poll is called a second time it returns records
> > >         boolean run = true;
> > >         while (run) {
> > >             ConsumerRecords<String, String> records =
> consumer.poll(100);
> > >         }
> > >
> > >
> > >
> > >         //why would this return zero records?
> > >         ConsumerRecords<String, String> records = consumer.poll(100);
> > >
> > >
> > > //This is to show that there are records for topic "project-created"
> > >
> > > bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic
> > > project-created --from-beginning
> > > 3a543cf4-13e4-42b6-8c72-54b733228c75
> > > c298d41e-4ae5-4f4f-933e-f745969aaf98
> > > 2036ef69-9694-4ee8-a40c-50cddf982edb
> > > 08d90698-8c75-4c4b-8ce6-b0b12d3de0d6
> > > 249d0e8c-cd38-41b0-ae7f-bdfa7fa8f76e
> > > 84d87e45-cd10-4797-bfe3-d540e26e56cc
> > > a8c74f78-cb7a-4c74-873b-a0e587af646c
> > > 42eeff6e-22fd-40c4-96b4-91a3df356ec7
> > > bef622b6-3ac1-489b-9837-ab9c2b0399d1
> > > 03e9d567-fadf-46dc-8097-4bc327a0942e
> > > df5d93d6-30de-494a-8d82-45cd8b4fc785
> > > eb5a194f-083d-4a0f-bbb7-155056acd929
> > > 192dfbb7-4f4f-4550-a1b4-ab9bc83be825
> > > b7845e7e-b477-476d-b115-b0a9d076c52e
> > > f8ea2d0f-bc76-44a8-86ef-2b3e0adf7755
> > > 8ee28e0c-5a8e-47c8-a939-f5dc6d7be3f9
> > > fbe30ae3-c383-4e27-8f70-f3c63ed82fed
> > > 50fa9166-cc0d-4d12-8d86-c922062519cd
> > > 50ceb437-2556-4a9a-8a13-93bfc130e914
> > > f1f5ad0b-7739-47fd-be57-ac093f728004
> > > 0df1a923-92d9-416a-861a-d65590f7e4c2
> > > 849e5e2b-3803-4a6e-86fc-64fe80d81268
> > > 163d9c1c-6224-4dad-a23a-1c90c9791e1f
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to