Hi all, I'm something of a kafka n00b.
I posted the following in the  google newsgroup, haven't had a reply
or even a single read so I'll try here. My original msg, slightly
edited, was:

----

(windows 2K8R2 fully patched, 16GB ram, fairly modern dual core xeon
server, latest version of java)

I've spent several days trying to sort out unexpected behaviour
involving kafka and the kafka console producer and consumer.

 If I set  the console produced and console consumer to look at the
same topic then I can type lines into the producer window and see them
appear in the consumer window, so it works.

If I try to pipe in large amounts of data to the producer, some gets
lost and the producer reports errors eg.

[2017-04-17 18:14:05,868] ERROR Error when sending message to topic
big_ptns1_repl1_nozip with key: null, value: 55 bytes with error:
(org.apache.kafka.clients.
producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Batch containing 8
record(s) expired due to timeout while requesting metadata from
brokers for big_ptns1_repl1_nozip-0

I'm using as input a file either shakespeare's full works (about 5.4
meg ascii), or a much larger file of shakespear's full works
replicated 900 times to make it about 5GB. Lines are ascii and short,
and each line should be a single record when read in by the console
producer. I need to do some benchmarking on time and space and this
was my first try.

As mentioned, data gets lost. I presume it is expected that any data
we pipe into the producer should arrive in the consumer, so if I do
this in one windows console:

kafka-console-consumer.bat --bootstrap-server localhost:9092  --topic
big_ptns1_repl1_nozip --zookeeper localhost:2181 >
F:\Users\me\Desktop\shakespear\single_all_shakespear_OUT.txt

and this in another:

kafka-console-producer.bat --broker-list localhost:9092  --topic
big_ptns1_repl1_nozip <
F:\Users\me\Desktop\shakespear\complete_works_no_bare_lines.txt

then the output file "single_all_shakespear_OUT.txt" should be
identical to the input file "complete_works_no_bare_lines.txt" except
it's not. For the complete works (sabout 5.4 meg uncompressed) I lost
about 130K in the output.
For the replicated shakespeare, which is about 5GB, I lost about 150 meg.

This can't be right surely and it's repeatable but happens at
different places in the file when errors start to be produced, it
seems.

I've done this using all 3 versions of kafak in the 0.10.x.y branch
and I get the same problem (the above commands were using the 0.10.0.0
branch so they look a little obsolete but they are right for that
branch I think). It's cost me some days.
So, am I making a mistake, if so what?

thanks

jan

Reply via email to