Couple suggestions for improving perf with FC:

  *   You have only one source, add more. This increases number of concurrent 
writes to the file channel. You already have 4 sinks so that's fine. In my 
experience you can expect improvement with upto 8 sinks.
  *   Use more dataDirs (even if using a single disk). In my experience 
increasing it upto 6 or 8 dataDirs helps.
  *   Like Hari said, try larger batch sizes. For 500 byte events, in my setup, 
I have seen perf improve till batch sizes around 500k.

-roshan


From: Hari Shreedharan 
<[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, March 5, 2015 2:42 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Syslog TCP performances issue with filechannel

So if you use the Multiport Syslog Source, you can specify a batch size - which 
is the size of a transaction, and there is one fsync at the end of each 
transaction.

Regarding the tests - those were done over 2 years ago, using the Memory 
Channel.

Thanks,
Hari



On Thu, Mar 5, 2015 at 1:11 AM, Smaine Kahlouch 
<[email protected]<mailto:[email protected]>> wrote:

Actually the batchSize is configured on sink level.
I didn't find this option on file channel.

Furthermore, the source batchSize can't be configured because it is a syslog-ng 
tool which doesn't have this capability.
I tried with "netcat" source and i face the same behaviour.

I guess you're right, for each event there's a fsync which causes the heavy 
load on diks.
However i've read this topic : 
https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements

And they didn't have the same problem obviously.

Regards,

--
Smaine Kahlouch - Engineer, Research & Engineering
Arkena | T: +33 1 5868 6196
27 Blvd Hippolyte Marquès, 94200 Ivry-sur-Seine, France
arkena.com


On 03/04/15 20:08, Hari Shreedharan wrote:
You should probably increase the batch size, since each batch causes an fsync 
which slows things down.

Thanks,
Hari



On Wed, Mar 4, 2015 at 6:28 AM, Smaine Kahlouch 
<[email protected]<mailto:[email protected]>> wrote:

Hi all,

I'm currently doing benchmarks on flume.
We're planning to use flume with syslogtcp as source and filechannel in order 
to have avoid data loss.

The performances are quiet good when a memorychannel is used :
~100 000events/sec (event size = 600bytes)

But as soon as i switch to filechannel the performances drop drammatically:
~300events/sec

Despite this poor result, the behaviour is really strange because i have a 
heavy disk usage (all the disks), near 100%.

I use a tool provided by syslog-ng in order to generate syslog logs : 
loggen<http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/loggen.1.html>

ex : loggen -i -I 3000000 --size 600 --active-connections 200 myflumehost 20515


Flume version : 1.5.2
Operating System : Centos 6

Please find my flume configuration enclosed. The filechannel is spread over 5 
disks in order to improve performance.


Could you please help me to configure properly syslogtcp source with 
filechannel ?

Regards,

--
Smaine Kahlouch - Engineer, Research & Engineering
Arkena | T: +33 1 5868 6196
27 Blvd Hippolyte Marquès, 94200 Ivry-sur-Seine, France
arkena.com

<flume.conf>



<flume.conf>

Reply via email to