Published in our wiki: https://cwiki.apache.org/confluence/display/FLUME/BatchSize,+ChannelCapacity+and+ChannelTransactionCapacity+Properties
- Alex On Jan 11, 2013, at 6:03 PM, Jeff Lord <[email protected]> wrote: > Bhaskar, > > I have created the following jira for this: > https://issues.apache.org/jira/browse/FLUME-1829 > > -Jeff > > > On Fri, Jan 11, 2013 at 6:48 AM, Bhaskar V. Karambelkar <[email protected] >> wrote: > >> Thanks Jeff, >> Clear and detailed explanations. These deserve to be on the wiki, as these >> parameters have direct implications on the performance of flume nodes. >> >> thanks >> Bhaskar >> >> >> On Tue, Jan 8, 2013 at 9:40 PM, Jeff Lord <[email protected]> wrote: >> >>> Hi Bashkar, >>> >>> 1) Batch Size >>> 1.a) When configured by client code using the flume-core-sdk , to send >>> events to flume avro source. >>> The flume client sdk has an appendBatch method. This will take a list of >>> events and send them to the source as a batch. This is the size of the >>> number of events to be passed to the source at one time. >>> >>> 1.b) When set as a parameter on HDFS sink (or other sinks which support >>> BatchSize parameter) >>> This is the number of events written to file before it is flushed to HDFS >>> >>> 2) >>> 2.a) Channel Capacity >>> This is the maximum capacity number of events of the channel. >>> >>> 2.b) Channel Transaction Capacity. >>> This is the max number of events stored in the channel per transaction. >>> >>> How will setting these parameters to different values, affect throughput, >>> latency in event flow? >>> >>> In general you will see better throughput by using memory channel as >>> opposed to using file channel at the loss of durability. >>> >>> The channel capacity is going to need to be sized such that it is large >>> enough to hold as many events as will be added to it by upstream agents. >>> Ideal flow would see the sink draining events from the channel faster than >>> it is having events added by its source. >>> >>> The channel transaction capacity will need to be smaller than the channel >>> capacity. >>> e.g. If your Channel capacity is set to 10000 than Channel Transaction >>> Capacity should be set to something like 100. >>> >>> Specifically if we have clients with varying frequency of event >>> generation, i.e. some clients generating thousands of events/sec, while >>> others at a much slower rate, what effect will different values of these >>> params have on these clients ? >>> >>> Transaction Capacity is going to be what throttles or limits how many >>> events the source can put into the channel. This going to vary depending on >>> how many tiers of agents/collectors you have setup. >>> In general though this should probably be equal to whatever you have the >>> batch size set to in your client. >>> >>> With regards to the hdfs batch size, the larger your batch size the >>> better performance will be. However, keep in mind that if a transaction >>> fails the entire transaction will be replayed which could have the >>> implication of duplicate events downstream. >>> >>> -Jeff >>> >>> >>> >>> >>> On Tue, Jan 8, 2013 at 10:46 AM, Bhaskar V. Karambelkar < >>> [email protected]> wrote: >>> >>>> Can some one explain the importance of the following >>>> 1) Batch Size >>>> 1.a) When configured by client code using the flume-core-sdk , to send >>>> events to flume avro source. >>>> 1.b) When set as a parameter on HDFS sink (or other sinks which >>>> support BatchSize parameter) >>>> 2) >>>> 2.a) Channel Capacity >>>> 2.b) Channel Transaction Capacity. >>>> >>>> >>>> Under which conditions should these params be set to high values, and >>>> under which conditions should they be set to low values. >>>> >>>> >>>> How will setting these parameters to different values, affect >>>> throughput, latency in event flow. >>>> Specifically if we have clients with varying frequency of event >>>> generation, i.e. some clients generating thousands of events/sec, while >>>> others at a much slower rate, what effect will different values of these >>>> params have on these clients ? >>>> >>>> thanks >>>> Bhaskar >>>> >>> >>> >> -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF
