Re: File Channel performance and fsync

Jagadish Bihani Mon, 22 Oct 2012 23:41:05 -0700

Hi Brock

I am using flume 1.2.0.

About the batching : as per user guide "exec source" does have batchoption in 1.2.0 (param name:

batchSize and default value:20) and I

have tried it. Apparently it works fine. And file channel has parameter"transactionCapacity" set

to 1000 by default. Is that the batch size of file channel?

Anyway even with increased batching I couldn't cross 110-150 KB/sec withFile Channel.Could you please help me understanding questions I asked in the originalmail of this thread aboutfsync lies. Because with disk which "apparently does fsync lie" I get 3MB/sec in 1 flow.I don't know whether that actually does "fsync lie" but there isremarkable difference in fsync

performance on 2 machines which do have almost similar hardware.

Regards
Jagadish



On 10/22/2012 07:59 PM, Brock Noland wrote:

In this cae, it's best to think about FileChannel as if it were adatabase. Let's pretend we are going to insert 1 million rows. If wecommitted on each row, would performance be "good"? No, everyoneknows that when you are inserting rows in databases, you want to batch100-1000 rows into a single commit, if you want "good" performance.(Quoting good because it's subjective based on the scenario, but inthis case we mean lots of MB/second).

Part of the reason behind this logic is that when a database does acommit, it does an fsync operation to ensure that all data is writtento disk and that you will not lose data due to a subsequent power loss.

FileChannel behaves *exactly* the same. If your "batch" is only asingle event, file channel will:


write single event
fsync
write single event
fsync

As such, if you want "good" performance with FileChannel, you mustincrease your batch size, just like a database. If you have abatchSize of say 100, then FileChannel will:


write single event 0
write single event 1
...
write single event 99
fsync

Which will result in much "better" performance. It's worth noting thatExecSource in Flume 1.2, does not have a batchSize and as such eachevent is written and then committed. ExecSource in flume 1.3, which wewill release soon, does have a configurable batchSize. If you want totry that out you can build it from the flume-1.3.0 branch.


Brock

On Mon, Oct 22, 2012 at 8:59 AM, Brock Noland <[email protected]<mailto:[email protected]>> wrote:


    Which version? 1.2 or trunk?

    On Monday, October 22, 2012 at 8:18 AM, Jagadish Bihani wrote:

    Hi

    This is the simplistic configuration with which I am getting
    lower performance.
    Even with 2-tier architecture (cat source - avro sinks - avro
    source- HDFS sink)
    I get the similar performance with file channel.

    Configuration:
    =========
    adServerAgent.sources = avro-collection-source
    adServerAgent.channels = fileChannel
    adServerAgent.sinks = hdfsSink fileSink

    # For each one of the sources, the type is defined
    adServerAgent.sources.avro-collection-source.type=exec
    adServerAgent.sources.avro-collection-source.command= cat
    /home/hadoop/file.tsf

    # The channel can be defined as follows.
    adServerAgent.sources.avro-collection-source.channels = fileChannel

    #Define file sink
    adServerAgent.sinks.fileSink.type = file_roll
    adServerAgent.sinks.fileSink.sink.directory =
    /home/hadoop/flume_sink*
    *
    adServerAgent.sinks.fileSink.channel = fileChannel
    adServerAgent.channels.fileChannel.type=file
    
adServerAgent.channels.fileChannel.dataDirs=/home/hadoop/flume/channel/dataDir5
    
adServerAgent.channels.fileChannel.checkpointDir=/home/hadoop/flume/channel/checkpointDir5
    adServerAgent.channels.fileChannel.maxFileSize=4000000000

    And it is run with :
    JAVA_OPTS = -Xms500m -Xmx700m -Dcom.sun.management.jmxremote
    -XX:MaxDirectMemorySize=2g

    Regards,
    Jagadish

    On 10/22/2012 05:42 PM, Brock Noland wrote:

    Hi,

    I'll respond in more depth later, but it would help if you
    posted your configuration file and the version of flume you are
    using.

    Brock

    On Mon, Oct 22, 2012 at 6:48 AM, Jagadish Bihani
    <[email protected]
    <mailto:[email protected]>> wrote:

    Hi

    I am writing this on top of another thread where there was
    discussion on "fsync lies" and
    only file channel used fsync and not file sink. :

    -- I tested the fsync performance on 2 machines (On 1 machine I
    was getting very good throughput
    using file channel and on another almost 100 times slower with
    almost same hardware configuration.)
    using following code


    #define PAGESIZE 4096

    int main(int argc, char *argv[])
    {

            char my_write_str[PAGESIZE];
            char my_read_str[PAGESIZE];
            char *read_filename= argv[1];
            int readfd,writefd;

            readfd = open(read_filename,O_RDONLY);
            writefd = open("written_file",O_WRONLY|O_CREAT,777);
            int len=lseek(readfd,0,2);
            lseek(readfd,0,0);
            int iterations = len/PAGESIZE;
            int i;
            struct timeval t0,t1;

    for(i=0;i<iterations;i++)
            {

    read(readfd,my_read_str,PAGESIZE);
    write(writefd,my_read_str,PAGESIZE);
    *gettimeofday(&t0,0);**
    **fsync(writefd);**
    **gettimeofday(&t1,0);*
                    long elapsed = (t1.tv_sec-t0.tv_sec)*1000000 +
    t1.tv_usec-t0.tv_usec;
    printf("Elapsed time is= %ld \n",elapsed);
             }
            close(readfd);
            close(writefd);
    }


    -- As expected it requires typically 50000 microseconds for
    fsync to complete on one machine and 200 microseconds
    on another machine it took 290 microseconds to complete on an
    average. So is machine with higher
    performance is doing a 'fsync lie'?
    i
    -- If I have understood it clearly; "fsync lie" means the data
    is not actually written to disk and it is in
    some disk/controller buffer.  I) Now if disk loses power due to
    some shutdown or any other disaster, data will
    be lost. II) Can data be lost even without it ? (e.g. if it is
    keeping data in some disk buffer and if fsync is being
    invoked continuously then will that data can also be lost? If
    only part -I is true; then it can be acceptable
    because probability of shutdown is usually less in production
    environment. But if even II is true then there is a
    problem.

    -- But on the machine where disk doesn't lie performance of
    flume using File channel is very low (I have seen it
    maximum 100 KB/sec even with sufficient DirectMemory
    allocation.) Does anybody have stats about throughput
    of file channel ? Is anybody getting better performance with
    file channel (without fsync lies). What is the recommended
    usage of it for an average scenario ? (Transferring files of
    few MBs to HDFS sink continuously on typical hardware
    (16 core processors, 16 GB RAM etc.)


    Regards,
    Jagadish

    On 10/10/2012 11:30 PM, Brock Noland wrote:

    Hi,

    On Wed, Oct 10, 2012 at 11:22 AM, Jagadish Bihani
    <[email protected]>  <mailto:[email protected]>  
wrote:

    Hi Brock

    I will surely look into 'fsync lies'.

    But as per my experiments I think "file channel" is causing the issue.
    Because on those 2 machines (one with higher throughput and other with
    lower)
    I did following experiment:

    cat Source -memory channel - file sink.

    Now with this setup I got same throughput on both the machines. (around 3
    MB/sec)
    Now as I have used "File sink" it should also do "fsync" at some point of
    time.
    'File Sink' and 'File Channel' both do disk writes.
    So if there is differences in disk behaviour then even in the 'File Sink' it
    should be visible.

    Am I missing something here?

    File sink does not call fsync.

    Regards,
    Jagadish



    On 10/10/2012 09:35 PM, Brock Noland wrote:

    OK your disk that is giving you 40KB/second is telling you the truth
    and the faster disk is lying to you. Look up "fsync lies" to see what
    I am referring to.

    A spinning disk can do 100 fsync operations per second (this is done
    at the end of every batch). That is how I estimated your event size,
    40KB/second is doing 40KB / 100 =  409 bytes.

    Once again, if you want increased performance, you should increase the
    batch size.

    Brock

    On Wed, Oct 10, 2012 at 11:00 AM, Jagadish Bihani
    <[email protected]>  <mailto:[email protected]>  
wrote:

    Hi

    Yes. It is around 480 - 500 bytes.


    On 10/10/2012 09:24 PM, Brock Noland wrote:

    How big are your events? Average about 400 bytes?

    Brock

    On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani
    <[email protected]>  <mailto:[email protected]>  
wrote:

    Hi

    Thanks for the inputs Brock. After doing several experiments
    eventually problem boiled down to disks.

        -- But I had used the same configuration (so all software components
    are
    same in all 3 machines)
    on all 3 machines.
    -- In User guide it is written that if multiple file channel instances
    are
    active on the same agent then
    different disks are preferable. But in my case only one file channel is
    active per agent.
    -- Only one pattern I observed that on the machines where I got better
    performance have multiple disks.
    But I don't understand how that will help if I have only 1 active file
    channel.
    -- What is the impact of the type of disk/disk device driver on
    performance?
    I mean I don't understand
    with 1 disk I am getting 40 KB/sec and with other 2 MB/sec.

    Could you please elaborate on File channel and disks correlation.

    Regards,
    Jagadish


    On 10/09/2012 08:01 PM, Brock Noland wrote:

    Hi,

    Using file channel, in terms of performance, the number and type of
    disks is going to be much more predictive of performance than CPU or
    RAM. Note that consumer level drives/controllers will give you much
    "better" performance because they lie to you about when your data is
    actually written to the drive. If you search for "fsync lies" you'll
    find more information on this.

    You probably want to increase the batch size to get better performance.

    Brock

    On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani
    <[email protected]>  <mailto:[email protected]>  
wrote:

    Hi

    My flume setup is:

    Source Agent : cat source - File Channel - Avro Sink
    Dest Agent :     avro source - File Channel - HDFS Sink.

    There is only 1 source agent and 1 destination agent.

    I measure throughput as amount of data written to HDFS per second.
    ( I have rolling interval 30 sec; so If 60 MB file is generated in 30
    sec
    the
    throughput is : -- 2 MB/sec ).

    I have run source agent on various machines with different hardware
    configurations :
    (In all cases I run flume agent with JAVA OPTIONS as
    "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
    -XX:MaxDirectMemorySize=2g")

    JDK is 32 bit.

    Experiment 1:
    =====
    RAM : 16 GB
    Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
    64 bit Processor with 64 bit Kernel.
    Throughput: 2 MB/sec

    Experiment 2:
    ======
    RAM : 4 GB
    Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
    64 bit Processor with 32 bit Kernel.
    Throughput : 30 KB/sec

    Experiment 3:
    ======
    RAM : 8 GB
    Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
    64 bit Processor with 32 bit Kernel.
    Throughput : 80 KB/sec

        -- So as can be seen there is huge difference in the throughput with
    same
    configuration but
    different hardware.
    -- In the first case where throughput is more RES is around 160 MB in
    other
    cases it is in
    the range of 40 MB - 50 MB.

    Can anybody please give insights that why there is this huge difference
    in
    the throughput?
    What is the correlation between RAM and filechannel/HDFS sink
    performance
    and also
    with 32-bit/64 bit kernel?

    Regards,
    Jagadish

--Apache MRUnit - Unit testing MapReduce -

    http://incubator.apache.org/mrunit/

--

Apache MRUnit - Unit testing MapReduce -http://incubator.apache.org/mrunit/

Re: File Channel performance and fsync

Reply via email to