Hi

I encountered an problem in my scenario with netcat source. Setup is
Host A: Netcat source -file channel -avro sink
Host B: Avro source - file channel - HDFS sink
But to simplify it I have created a single agent with "Netcat Source" and "file roll sink"*
*It is *:
*Host A: Netcat source - file channel - File_roll sink

*Problem*:
1. To simulate the our production scenario. I have created a script which runs for 15 sec and in the while loop writes requests netcat source on a given port. For a large value of the sleep events are delivered correctly to the destination. But as I reduce the delay events are given to the source but they are not delivered to the destination. e.g. I write 9108 records within 15 sec using script and only 1708 got delivered. And I don't get any exception. If it is flow control related problem then I should have seen some exception in agent logs. But with file channel and huge disk space, is it a problem?

***Machine Configuration:*
RAM : 8 GB
JVM : 200 MB
CPU: 2.0 GHz Quad core processor

*Flume Agent Confi**guration*
adServerAgent.sources = netcatSource
adServerAgent.channels = fileChannel memoryChannel
adServerAgent.sinks = fileSink

# For each one of the sources, the type is defined
adServerAgent.sources.netcatSource.type = netcat
adServerAgent.sources.netcatSource.bind = 10.0.17.231
adServerAgent.sources.netcatSource.port = 55355

# The channel can be defined as follows.
adServerAgent.sources.netcatSource.channels = fileChannel
#adServerAgent.sources.netcatSource.channels = memoryChannel

# Each sink's type must be defined
adServerAgent.sinks.fileSink.type = file_roll
adServerAgent.sinks.fileSink.sink.directory = /root/flume/flume_sink

#Specify the channel the sink should use
#adServerAgent.sinks.fileSink.channel = memoryChannel
adServerAgent.sinks.fileSink.channel = fileChannel

adServerAgent.channels.memoryChannel.type =memory
adServerAgent.channels.memoryChannel.capacity = 100000
adServerAgent.channels.memoryChannel.transactionCapacity = 10000

adServerAgent.channels.fileChannel.type=file
adServerAgent.channels.fileChannel.dataDirs=/root/jagadish/flume_channel1/dataDir3
adServerAgent.channels.fileChannel.checkpointDir=/root/jagadish/flume_channel1/checkpointDir3**

*Script  snippet being used:*
...
eval
{
        local $SIG{ALRM} = sub { die "alarm\n"; };
        alarm $TIMEOUT;
        my $i=0;
        my $str = "";
        my $counter=1;
        while(1)
        {
                        $str = "";
                        for($i=0; $i < $NO_ELE_PER_ROW; $i++)
                        {
                                $str .= $counter."\t";
                                $counter++;
                        }
                        chop($str);
                        #print $socket "$str\n";
                        $socket->send($str."\n") or die "Didn't send";

                        if($? != 0)
                        {
                                print "Failed for $str \n";
                        }
                        print "$str\n";
                        Time::HiRes::usleep($SLEEP_TIME);
        }
        alarm 0;
};
if ($@) {
......

- Script is working fine as for the very large delay all events are getting transmitted correctly.* *- Same problem occurs with memory channel too but with lower values of sleep.*

**Problem 2:*
-- With this setup I am getting very low throughput i.e. I am able to transfer only ~ 1 KB/sec data to the destination file sink. Similar performance was achieved using HDFS sink. -- I had tried increasing batch sizes in my original scenario without much gain in throughput.
-- I had seen using 'tail -F' as source almost 10 times better throughput.
-- Is there any tunable parameter for netcat source?

Please help me in above 2 cases - i)netcat source use  cases
ii) Typical flume's expected throughput with file channel and file/HDFS sink on the single machine.

Regards,
Jagadish

Reply via email to