Hi, [2017-04-17 18:14:05,868] ERROR Error when sending message to topic big_ptns1_repl1_nozip with key: null, value: 55 bytes with error: (org.apache.kafka.clients. producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.TimeoutException: Batch containing 8 record(s) expired due to timeout while requesting metadata from brokers for big_ptns1_repl1_nozip-0
data didn't reach producer. So why should data appear in consumer? loss rate is more or less similar : 0.02 (130k / 5400mb) ~ 0.03% (150mb / 5000gb) Not so bad. 2017-04-18 21:46 GMT+02:00 jan <rtm4...@googlemail.com>: > Hi all, I'm something of a kafka n00b. > I posted the following in the google newsgroup, haven't had a reply > or even a single read so I'll try here. My original msg, slightly > edited, was: > > ---- > > (windows 2K8R2 fully patched, 16GB ram, fairly modern dual core xeon > server, latest version of java) > > I've spent several days trying to sort out unexpected behaviour > involving kafka and the kafka console producer and consumer. > > If I set the console produced and console consumer to look at the > same topic then I can type lines into the producer window and see them > appear in the consumer window, so it works. > > If I try to pipe in large amounts of data to the producer, some gets > lost and the producer reports errors eg. > > [2017-04-17 18:14:05,868] ERROR Error when sending message to topic > big_ptns1_repl1_nozip with key: null, value: 55 bytes with error: > (org.apache.kafka.clients. > producer.internals.ErrorLoggingCallback) > org.apache.kafka.common.errors.TimeoutException: Batch containing 8 > record(s) expired due to timeout while requesting metadata from > brokers for big_ptns1_repl1_nozip-0 > > I'm using as input a file either shakespeare's full works (about 5.4 > meg ascii), or a much larger file of shakespear's full works > replicated 900 times to make it about 5GB. Lines are ascii and short, > and each line should be a single record when read in by the console > producer. I need to do some benchmarking on time and space and this > was my first try. > > As mentioned, data gets lost. I presume it is expected that any data > we pipe into the producer should arrive in the consumer, so if I do > this in one windows console: > > kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic > big_ptns1_repl1_nozip --zookeeper localhost:2181 > > F:\Users\me\Desktop\shakespear\single_all_shakespear_OUT.txt > > and this in another: > > kafka-console-producer.bat --broker-list localhost:9092 --topic > big_ptns1_repl1_nozip < > F:\Users\me\Desktop\shakespear\complete_works_no_bare_lines.txt > > then the output file "single_all_shakespear_OUT.txt" should be > identical to the input file "complete_works_no_bare_lines.txt" except > it's not. For the complete works (sabout 5.4 meg uncompressed) I lost > about 130K in the output. > For the replicated shakespeare, which is about 5GB, I lost about 150 meg. > > This can't be right surely and it's repeatable but happens at > different places in the file when errors start to be produced, it > seems. > > I've done this using all 3 versions of kafak in the 0.10.x.y branch > and I get the same problem (the above commands were using the 0.10.0.0 > branch so they look a little obsolete but they are right for that > branch I think). It's cost me some days. > So, am I making a mistake, if so what? > > thanks > > jan >