Which one could the reason be of the results between C.4 agent, one topic with 4 partitions, one agent for each partition 1 min 12 sg 13888 msg/sg D.4 agent, one topic with 8 partitions, one agent for every two partitions 46 sg 21739 msg/sg
If there's just one thread, why does the performance improve? The reason to have more than one agent flume was that if each agent flume installed in the machine A and a Kafka installed in the same machine, AgentA is who reads from the partitions of KafkaA, so there aren't any transmission of data for the networks in that step.. I guess that isn't possible for Flume to know that. What I saw it's that if I have 10 partitions, each time I execute a new agent, the partitions are distributed between all the agents to balance the load. 2015-03-04 17:59 GMT+01:00 Hari Shreedharan <[email protected]>: > Sinks are single threaded. If you have more threads your performance will > improve. And you are right in the sense that if you want to test the Kafka > components then you should use null sink. > > Also note that all your sinks can be one the same agent, you don't need > several agents just to have multiple sinks. Just have them configured to use > the same channel. > > Thanks, > Hari > > > On Wed, Mar 4, 2015 at 8:20 AM, Guillermo Ortiz <[email protected]> > wrote: >> >> Hello, >> >> We're doing some tests with Kafka-Flume. >> >> We have four kafka and Flumes installed, There are 8 Datanodes >> installed in others machines. >> We have developed a injector to Kafka and want to read messages with >> Flume, we have been trying these configurations: >> >> Injector --> Kafka --> SoruceFlume --> Memory Channel --> Sink HDFS >> Injector --> Kafka Channel --> Sink HDFS >> >> We start to execute Flume when our injector ends to inject 1M message >> of 1024bytes and measure how many messages are processed per second. I >> mean, time from reading of kafka until writting them in hdfs. >> >> Kafka --> SoruceFlume --> Memory Channel --> Sink HDFS >> A.1 agent, one topic with 4 partitions 1 min 53 sg 8849 msg/sg >> B.1 agent, one topic with 8 partitions 1 min 47 sg 9345 sg/sg >> C.4 agent, one topic with 4 partitions, one agent for each partition 1 >> min 12 sg 13888 msg/sg >> D.4 agent, one topic with 8 partitions, one agent for every two >> partitions 46 sg 21739 msg/sg >> E.4 agent, one topic with 12 partitions, one agent for every three >> partitions 50 sg 20000 msg/sg >> >> Kafka Channel --> Sink HDFS >> F. 1 agent ,One topic with one partition 2 min 50 sg. 5882 msg/sg >> G.1 agent, one topic with 4 partitions 3 min 5555 msg/sg >> H.4 agents, 4 partitions, one agent for each partition 46 sg 21739 >> msg/sg Kafka channel, no source >> K.4 agents, 8 partitions, one agent for every two partitions 69 sg >> 14925 msg/sg Kafka channel, no source >> >> I'm confused with H and K, >> I guess that the sink is monothread, so, you need to have at least as >> many hdfs sinks as partitions in Kafka. That's why H is four times >> better than G. >> It's weird the different between D and K, Could someone tell me the >> reason? Is it the KafkaSource monotheard? >> >> On th other hand, it seems like the number of messages per seconds >> it's pretty low. We'll try to tune Flume with a bigger batchSize and >> others parameters to improve the performance.. Any advise about it? I >> thought as well to try with Null Sink to isolate Flume of HDFS. > >
