Hi Tom, Thanks for your help.
- Do you see messages in all kafka partitions? Yes, i do see the messages in all kafka partitions. - How large are the messages in kb It's small only about 120 bytes per message. - Do you need exactly-once processing? If yes use Trident if not, use vanilla storm ( http://stackoverflow.com/questions/15520993/storm-vs-trident-when-not-to-use-trident ) Unfortunately i need exactly one processing. Do you have any immediate thoughts abou this? Sent from my HTC ----- Reply message ----- From: "Ziemer, Tom" <[email protected]> To: "[email protected]" <[email protected]> Subject: How to have multiple storm workers consume a kafka topic in parallel Date: Thu, Sep 3, 2015 3:12 PM Hi Kien, I don’t see any immediate issue with the setup – some thoughts: - Do you see messages in all kafka partitions? - How large are the messages in kb? - Do you need exactly-once processing? If yes use Trident if not, use vanilla storm ( http://stackoverflow.com/questions/15520993/storm-vs-trident-when-not-to-use-trident ) Since you do not specify the partitions explicitly but use ZK instead, the spout should be able to pick it up from there. Regards, Tom From: trung kien [mailto:[email protected]] Sent: Donnerstag, 3. September 2015 09:07 To: [email protected] Subject: Re: How to have multiple storm workers consume a kafka topic in parallel Hi Tom, Yes, i have my topic partitioned. I created the topic with --partitions 10 Here is how i create my KafkaSpout: BrokerHosts zk = ZkHosts("zkserver"); TridentKafkaConfig spoutConfig = new TridentKafkaConfig(zk, "my_queue"); spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme()); OpaqueTridentKafkaSpout kafkaSpout= new OpaqueTridentKafkaSpout(spoutConf); And im using this config like following: TridentTopology topology = new TridentTopology(); topology.newStream("myStream",kafkaSpout).shuffle().each(new Fields("str"), new JsonDecode(), new Fields("user_id","action")).parallelismHint(10); With this setting it only can handle arround 20k mesaages per sec. However, i want a lot more ( ~ 100k per sec). On the storm UI i only see 1 executors for the spout. Is there any config i can turn for greater performance here? Sent from my HTC ----- Reply message ----- From: "Ziemer, Tom" <[email protected]<mailto:[email protected] >> To: "[email protected]<mailto:[email protected]>" < [email protected]<mailto:[email protected]>> Subject: How to have multiple storm workers consume a kafka topic in parallel Date: Thu, Sep 3, 2015 12:59 PM Hi, is your kafka topic partitioned? See: http://stackoverflow.com/questions/17205561/data-modeling-with-kafka-topics-and-partitions How is KafkaSpout configured? Regards, Tom From: trung kien [mailto:[email protected]<mailto:[email protected]>] Sent: Mittwoch, 2. September 2015 09:05 To: [email protected]<mailto:[email protected]> Subject: How to have multiple storm workers consume a kafka topic in parallel Hi Storm Users, I am new with Storm and using Trident for my applications. My application needs to push large of message into Kafka (in Json format), do some calculations and save the result in Redis. It seems that storm always assign only 1 worker for consuming the Kafka topic (even I have .parallelismhint(5) and my Storm cluster have 10 workers) Is there any way to have more than one worker consume a Kafka queue in parallel? Here is my topology code: topology.newStream("msg",kafkaSpout) .shuffle() .each(new Fields("str"),new JsonDecode(), new Fields("user_id","user_name")) .parallelismHint(5); Could someone please help me on this? only one worker is causing high latency in my application. -- Thanks Kien
