RE: All channels in an agent get slower after a channel is full

j.guilmard Fri, 14 Nov 2014 02:38:56 -0800

Yes Hari, that’s exactly my point: by default, whatever are the channels 
(Memory or File), one channel filling up will slow down the associated Source, 
and therefore any other channel associated to it. (It will also impact by 
extension, any client sending event to this source, as the source will 
acknowledge the events slower).
The higher the keep-alive is (default 3sec), the bigger the global impact will 
be.


Vincentius might reduce this impact by lowering the keep-alive of his channels 
to 1 second (lowest possible value).

What do you think of a future evolution enabling an optional configuration of 
the channels, in order to change the keep-alive time unit (default Seconds for 
non reg), so that some users like Vincentius and I could put the Keep-live to 
something like 100ms, with channels “Optionnal” for example ?

Also could you elaborate on the conséquences of having a keep-alive = 0 ? I 
understand that in a Channel full situation, the tryAquire will immediately 
fail, without waiting, but I do not get the Semaphore dieing possiblity.

Regards

Jeff

From: Hari Shreedharan [mailto:[email protected]]
Sent: vendredi 14 novembre 2014 01:29
To: [email protected]
Cc: [email protected]
Subject: RE: All channels in an agent get slower after a channel is full

It is expected that if one channel is full, the whole batch is considered 
failed, and the source will retry. If even one required channel is full, the 
whole transaction fails. If you don’t want this mark channels are optional.

Also, all channels have a keep-alive, that is the period (in seconds) that the 
put fails with lack of data. You can reduce this via configuration. If you 
reduce this to to 0, it may cause major concurrency issues (since semaphores 
will start dieing etc). Things slowing down could be because of this as well.

Thanks,
Hari


On Thu, Nov 13, 2014 at 4:22 PM, 
[email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>> wrote:
Hi Hari,


I’m jumping in this discussion as I’m facing similar behavior on channel full 
impacts.


I was trying to optimize an HTTPSink that does not sustain the performance it 
should when I faced same issue than described below, but with MemoryChannels:
1 source (let’s say Avro), with a Replicating Selector duplicating the events 
in 2 MemoryChannels.
When one MemoryChannel is full, the other one is getting down, and even worse, 
the Source is getting down as well.


So I suspected initially my particular Sink to have effect on other threads or 
on the JVM. So I removed it, and tried a very simple config:
a1.sources = r1
a1.channels = c1
a1.sinks = k1


a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 1234


a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000


a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 127.0.0.1
a1.sinks.k1.port = 3456


I put another agent listening on the AVRO events on 3456, and I inject load 
into the main one, then I stop the listener agent.

=>  The channel c1 is off course filling up… but the source is impacted as 
well, by the channel.


The threaddump is explicit:
"New I/O  worker #15" prio=6 tid=0x000000000d252000 nid=0x2990 waiting on 
condition [0x0000000010cee000]
   java.lang.Thread.State: TIMED_WAITING (parking)
                at sun.misc.Unsafe.park(Native Method)
                - parking to wait for  <0x00000007818f9c00> (a 
java.util.concurrent.Semaphore$NonfairSync)
                at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
                at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
                at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
                at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:588)
                at 
org.apache.flume.channel.MemoryChannel$MemoryTransaction.doCommit(MemoryChannel.java:128)
                at 
org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
                at 
org.apache.flume.channel.ChannelProcessor.processEvent(ChannelProcessor.java:267)
                at 
org.apache.flume.source.AvroSource.append(AvroSource.java:348)
                at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
                at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:606)
                at 
org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:88)
                at org.apache.avro.ipc.Responder.respond(Responder.java:149)
                at 
org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)


The source gets stuck on these commits, until the “keep-alive” timeout expires. 
I cannot lower a lot this keep-alive, as the lowest value seems to be 1 second. 
(Unit is seconds).


To put it in a nutshell, I don’t know if this behavior is expected, but if one 
Channel is filling up (at least a MemoryChannel), as per my understand it will 
impact any other channel linked to the same source, and will impact the Source 
itself.


Do you see any way to prevent a Source from being impacted by the channel 
filling up ? In my specific scenario, I would prefer losing some events, or at 
least keep the other channels working.


PS: I’m using Flume 1.5 for these tests.


Regards


From: Hari Shreedharan [mailto:[email protected]]
Sent: jeudi 13 novembre 2014 22:04
To: [email protected]<mailto:[email protected]>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: All channels in an agent get slower after a channel is full


Yeah, when you are sharing disks — that would cause one channel’s behavior 
affect others since your disk is your bottleneck.



Thanks,
Hari



On Thu, Nov 13, 2014 at 1:02 PM, Vincentius Martin 
<[email protected]<mailto:[email protected]>> wrote:
Right now, I am using FileChannel.
Thanks


Regards,
Vincentius Martin


On Fri, Nov 14, 2014 at 4:00 AM, Hari Shreedharan 
<[email protected]<mailto:[email protected]>> wrote:
Are you using MemoryChannel or File Channel?

Thanks,
Hari



On Thu, Nov 13, 2014 at 12:59 PM, Vincentius Martin 
<[email protected]<mailto:[email protected]>> wrote:
Yes, they are sharing the same disk
I used to try it with memory channel, it also produced the same impact when a 
channel in an agent with many channels reaches its channel capacity. It caused 
ChannelException and made other channels slower.




Regards,
Vincentius Martin


On Fri, Nov 14, 2014 at 3:47 AM, Hari Shreedharan 
<[email protected]<mailto:[email protected]>> wrote:
Are all the channels sharing the same disk(s)?

Thanks,
Hari



On Thu, Nov 13, 2014 at 12:44 PM, Vincentius Martin 
<[email protected]<mailto:[email protected]>> wrote:
it is between agents, I am using avro sinks and file channels while all of 
those channels write the checkpoint to a disk.
For the rest, I am using default configuration.


Regards,
Vincentius Martin


On Fri, Nov 14, 2014 at 1:39 AM, Hari Shreedharan 
<[email protected]<mailto:[email protected]>> wrote:
What does your configuration look like? What sink are you using?


On Thu, Nov 13, 2014 at 8:23 AM, Vincentius Martin 
<[email protected]<mailto:[email protected]>> wrote:
Hi,
In my cluster, I have an agent with one source connected to multiple channels. 
Each channel connected to different sink (1 channel paired with 1 sink) which 
send events to different agents (like one to many relation). Just like the 
multiplexing flow example in Flume user guide website.

However, when a channel reaches its capacity (already full)  I see that the 
agent performance gets slower.

What I mean by getting slower is that, all other channel-sink pairs in that 
agent also get slower when sending events to their destination. I can 
understand if the overfilled channel-sink pair get slower, but why it affects 
another channel-sink pairs in that agent? From what I see here, the other pairs 
should be independent with the overfilled channel except that they use the same 
source, right?
Thanks!

Regards,
Vincentius Martin















________________________________

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>

RE: All channels in an agent get slower after a channel is full

Reply via email to