Couple suggestions for improving perf with FC: * You have only one source, add more. This increases number of concurrent writes to the file channel. You already have 4 sinks so that's fine. In my experience you can expect improvement with upto 8 sinks. * Use more dataDirs (even if using a single disk). In my experience increasing it upto 6 or 8 dataDirs helps. * Like Hari said, try larger batch sizes. For 500 byte events, in my setup, I have seen perf improve till batch sizes around 500k.
-roshan From: Hari Shreedharan <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Thursday, March 5, 2015 2:42 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Cc: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: Syslog TCP performances issue with filechannel So if you use the Multiport Syslog Source, you can specify a batch size - which is the size of a transaction, and there is one fsync at the end of each transaction. Regarding the tests - those were done over 2 years ago, using the Memory Channel. Thanks, Hari On Thu, Mar 5, 2015 at 1:11 AM, Smaine Kahlouch <[email protected]<mailto:[email protected]>> wrote: Actually the batchSize is configured on sink level. I didn't find this option on file channel. Furthermore, the source batchSize can't be configured because it is a syslog-ng tool which doesn't have this capability. I tried with "netcat" source and i face the same behaviour. I guess you're right, for each event there's a fsync which causes the heavy load on diks. However i've read this topic : https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements And they didn't have the same problem obviously. Regards, -- Smaine Kahlouch - Engineer, Research & Engineering Arkena | T: +33 1 5868 6196 27 Blvd Hippolyte Marquès, 94200 Ivry-sur-Seine, France arkena.com On 03/04/15 20:08, Hari Shreedharan wrote: You should probably increase the batch size, since each batch causes an fsync which slows things down. Thanks, Hari On Wed, Mar 4, 2015 at 6:28 AM, Smaine Kahlouch <[email protected]<mailto:[email protected]>> wrote: Hi all, I'm currently doing benchmarks on flume. We're planning to use flume with syslogtcp as source and filechannel in order to have avoid data loss. The performances are quiet good when a memorychannel is used : ~100 000events/sec (event size = 600bytes) But as soon as i switch to filechannel the performances drop drammatically: ~300events/sec Despite this poor result, the behaviour is really strange because i have a heavy disk usage (all the disks), near 100%. I use a tool provided by syslog-ng in order to generate syslog logs : loggen<http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/loggen.1.html> ex : loggen -i -I 3000000 --size 600 --active-connections 200 myflumehost 20515 Flume version : 1.5.2 Operating System : Centos 6 Please find my flume configuration enclosed. The filechannel is spread over 5 disks in order to improve performance. Could you please help me to configure properly syslogtcp source with filechannel ? Regards, -- Smaine Kahlouch - Engineer, Research & Engineering Arkena | T: +33 1 5868 6196 27 Blvd Hippolyte Marquès, 94200 Ivry-sur-Seine, France arkena.com <flume.conf> <flume.conf>
