Hi
I encountered an problem in my scenario with netcat source. Setup is
Host A: Netcat source -file channel -avro sink
Host B: Avro source - file channel - HDFS sink
But to simplify it I have created a single agent with "Netcat Source"
and "file roll sink"*
*It is *:
*Host A: Netcat source - file channel - File_roll sink
*Problem*:
1. To simulate the our production scenario. I have created a script
which runs for 15 sec and in the
while loop writes requests netcat source on a given port. For a large
value of the sleep events are
delivered correctly to the destination. But as I reduce the delay events
are given to the source but they
are not delivered to the destination. e.g. I write 9108 records within
15 sec using script and only 1708
got delivered. And I don't get any exception. If it is flow control
related problem then I should have seen
some exception in agent logs. But with file channel and huge disk space,
is it a problem?
***Machine Configuration:*
RAM : 8 GB
JVM : 200 MB
CPU: 2.0 GHz Quad core processor
*Flume Agent Confi**guration*
adServerAgent.sources = netcatSource
adServerAgent.channels = fileChannel memoryChannel
adServerAgent.sinks = fileSink
# For each one of the sources, the type is defined
adServerAgent.sources.netcatSource.type = netcat
adServerAgent.sources.netcatSource.bind = 10.0.17.231
adServerAgent.sources.netcatSource.port = 55355
# The channel can be defined as follows.
adServerAgent.sources.netcatSource.channels = fileChannel
#adServerAgent.sources.netcatSource.channels = memoryChannel
# Each sink's type must be defined
adServerAgent.sinks.fileSink.type = file_roll
adServerAgent.sinks.fileSink.sink.directory = /root/flume/flume_sink
#Specify the channel the sink should use
#adServerAgent.sinks.fileSink.channel = memoryChannel
adServerAgent.sinks.fileSink.channel = fileChannel
adServerAgent.channels.memoryChannel.type =memory
adServerAgent.channels.memoryChannel.capacity = 100000
adServerAgent.channels.memoryChannel.transactionCapacity = 10000
adServerAgent.channels.fileChannel.type=file
adServerAgent.channels.fileChannel.dataDirs=/root/jagadish/flume_channel1/dataDir3
adServerAgent.channels.fileChannel.checkpointDir=/root/jagadish/flume_channel1/checkpointDir3**
*Script snippet being used:*
...
eval
{
local $SIG{ALRM} = sub { die "alarm\n"; };
alarm $TIMEOUT;
my $i=0;
my $str = "";
my $counter=1;
while(1)
{
$str = "";
for($i=0; $i < $NO_ELE_PER_ROW; $i++)
{
$str .= $counter."\t";
$counter++;
}
chop($str);
#print $socket "$str\n";
$socket->send($str."\n") or die "Didn't send";
if($? != 0)
{
print "Failed for $str \n";
}
print "$str\n";
Time::HiRes::usleep($SLEEP_TIME);
}
alarm 0;
};
if ($@) {
......
- Script is working fine as for the very large delay all events are
getting transmitted correctly.*
*- Same problem occurs with memory channel too but with lower values of
sleep.*
**Problem 2:*
-- With this setup I am getting very low throughput i.e. I am able to
transfer only ~ 1 KB/sec data
to the destination file sink. Similar performance was achieved using
HDFS sink.
-- I had tried increasing batch sizes in my original scenario without
much gain in throughput.
-- I had seen using 'tail -F' as source almost 10 times better throughput.
-- Is there any tunable parameter for netcat source?
Please help me in above 2 cases - i)netcat source use cases
ii) Typical flume's expected throughput with file channel and file/HDFS
sink on the single machine.
Regards,
Jagadish