Very slow producer
Hi, I’m writing my own producer to read from text files, and send line by line to Kafka cluster. I notice that the producer is extremely slow. It's currently sending at ~57KB/node/s. This is like 50-100 times slower than using bin/kafka-console-producer.sh Here’s my producer: final File dir = new File(dataDir); ListFile files = new ArrayList(Arrays.asList(dir.listFiles())); int key = 0; for (final File file : files) { try { BufferedReader br = new BufferedReader(new FileReader(file)); for (String line = br.readLine(); line != null; line = br.readLine()) { KeyedMessageString, String data = new KeyedMessage(topic, Integer.toString(key++), line); producer.send(data); } } catch (IOException e) { e.printStackTrace(); } } And partitioner: public int partition(Object key, int numPartitions) { String stringKey = (String)key; return Integer.parseInt(stringKey) % numPartitions; } The only difference between kafka-console-producer.sh code and my code is that I use a custom partitioner. I have no idea why it’s so slow. Best regards,Huy, Le Van
Re: Very slow producer
Did you set producer.type to async when creating your producer? The console producer uses async by default, but the default producer config is sync. -Ewen On Thu, Dec 11, 2014 at 6:08 AM, Huy Le Van huy.le...@insight-centre.org wrote: Hi, I’m writing my own producer to read from text files, and send line by line to Kafka cluster. I notice that the producer is extremely slow. It's currently sending at ~57KB/node/s. This is like 50-100 times slower than using bin/kafka-console-producer.sh Here’s my producer: final File dir = new File(dataDir); ListFile files = new ArrayList(Arrays.asList(dir.listFiles())); int key = 0; for (final File file : files) { try { BufferedReader br = new BufferedReader(new FileReader(file)); for (String line = br.readLine(); line != null; line = br.readLine()) { KeyedMessageString, String data = new KeyedMessage(topic, Integer.toString(key++), line); producer.send(data); } } catch (IOException e) { e.printStackTrace(); } } And partitioner: public int partition(Object key, int numPartitions) { String stringKey = (String)key; return Integer.parseInt(stringKey) % numPartitions; } The only difference between kafka-console-producer.sh code and my code is that I use a custom partitioner. I have no idea why it’s so slow. Best regards,Huy, Le Van -- Thanks, Ewen
Re: Very slow producer
Hi Ewen, Thank you for your response. It’s much faster after changing to async. Cheers,Huy, Le Van On Thursday, Dec 11, 2014 at 7:08 p.m., Ewen Cheslack-Postava e...@confluent.io, wrote: Did you set producer.type to async when creating your producer? The console producer uses async by default, but the default producer config is sync. -Ewen On Thu, Dec 11, 2014 at 6:08 AM, Huy Le Van wrote: Hi, I’m writing my own producer to read from text files, and send line by line to Kafka cluster. I notice that the producer is extremely slow. It's currently sending at ~57KB/node/s. This is like 50-100 times slower than using bin/kafka-console-producer.sh Here’s my producer: final File dir = new File(dataDir); List files = new ArrayList(Arrays.asList(dir.listFiles())); int key = 0; for (final File file : files) { try { BufferedReader br = new BufferedReader(new FileReader(file)); for (String line = br.readLine(); line != null; line = br.readLine()) { KeyedMessage data = new KeyedMessage(topic, Integer.toString(key++), line); producer.send(data); } } catch (IOException e) { e.printStackTrace(); } } And partitioner: public int partition(Object key, int numPartitions) { String stringKey = (String)key; return Integer.parseInt(stringKey) % numPartitions; } The only difference between kafka-console-producer.sh code and my code is that I use a custom partitioner. I have no idea why it’s so slow. Best regards,Huy, Le Van -- Thanks, Ewen ,@insight-centre.org