Re: ListenUDP: internal queue at maximum capacity, could not queue event

James Srinivasan Wed, 05 Jun 2019 09:07:22 -0700

In our case the stream is UDP broadcast, so available to all nodes anyway.
I've been meaning to test UDP multicast but not got round to it yet.



On Wed, 5 Jun 2019, 17:03 Bryan Bende, <[email protected]> wrote:

> That is probably a valid point, but how about putting a load balancer
> in front to handle that?
>
> On Wed, Jun 5, 2019 at 11:30 AM James Srinivasan
> <[email protected]> wrote:
> >
> > Presumably you'd want to mirror the stream to all nodes for when the
> primary node changes?
> >
> > On Wed, 5 Jun 2019, 13:46 Bryan Bende, <[email protected]> wrote:
> >>
> >> The processor is started on all nodes, but onTrigger method is only
> >> executed on the primary node.
> >>
> >> This is something we've discussed trying to improve before, but the
> >> real question is why are you sending data to the other nodes if you
> >> don't expect the processor to execute there?
> >>
> >> On Wed, Jun 5, 2019 at 7:04 AM Erik-Jan <[email protected]> wrote:
> >> >
> >> > I figured it out after further testing. The processor runs on all
> nodes, despite the explicit "run on primary node only" option that I
> selected. But only on the primary node the queue is processed. On the other
> nodes the queue gets filled until the max is reached after which the error
> message starts appearing. What I missed before is that the message is
> coming from the other, non-primary nodes.
> >> > I'm not sure if this is intended behavior or if it is a bug though!
> For me it's a bug since I really want this processor to run on the primary
> only.
> >> >
> >> > Op di 4 jun. 2019 16:34 schreef Erik-Jan <[email protected]>:
> >> >>
> >> >> Hi Bryan,
> >> >>
> >> >> Yes I have considerably increased the numbers in the controller
> settings.
> >> >> I don't mind getting my hands dirty, increasing the timeout is worth
> a try.
> >> >>
> >> >> The errors seems to appear after quite a while. Usually I see these
> messages the next morning so testing and experimenting with this error
> takes a lot of time.
> >> >>
> >> >> Today I've been trying to reproduce this on a virtual machine with
> the same OS, Nifi and Java versions but to no avail. The difference is that
> this VM is not a cluster, has limited memory and cpu and still is able to
> handle much more UDP data with the error appearing only a few times so far
> after hours of running. It leads me to thinking there must be something in
> the configuration of the cluster thats causing this. I will also try a
> vanilla Nifi install on one of the nodes without clustering to see if my
> configuration and cluster setup is somehow the cause.
> >> >>
> >> >> Op di 4 jun. 2019 om 16:14 schreef Bryan Bende <[email protected]>:
> >> >>>
> >> >>> Hi Erik,
> >> >>>
> >> >>> It sounds like you have tried most of the common tuning options that
> >> >>> can be done. I would have expected batching + increasing concurrent
> >> >>> tasks from 1 to 3-5 to be the biggest improvement.
> >> >>>
> >> >>> Have you increased the number of threads in your overall thread pool
> >> >>> according to your hardware? (from the top right menu controller
> >> >>> settings)
> >> >>>
> >> >>> I would be curious what happens if you did some tests increasing the
> >> >>> timeout where it attempts to place the message in the queue from
> 100ms
> >> >>> to 200ms and then maybe 500ms if it still happens.
> >> >>>
> >> >>> I know this requires a code change since that timeout is hard-coded,
> >> >>> but it sounds like you already went down that path with trying a
> >> >>> different queue :)
> >> >>>
> >> >>> -Bryan
> >> >>>
> >> >>> On Tue, Jun 4, 2019 at 4:28 AM Erik-Jan <[email protected]> wrote:
> >> >>> >
> >> >>> > Hi,
> >> >>> >
> >> >>> > I'm experimenting with a locally installed 3 node nifi cluster.
> This cluster receives UDP packets on the primary node.
> >> >>> > These nodes are pretty powerful, have a good network connection,
> have lots of memory and SSD disks. I gave nifi 24G of java heap (xms and
> xmx).
> >> >>> >
> >> >>> > I have configured a ListenUDP processor that listens on a UDP
> port and it receives somewhere between 20000 to 50000 packets per 5
> minutes. It's "Max size of message queue" is large enough (1M), I gave it 5
> concurrent tasks, it's running on the primary node only.
> >> >>> >
> >> >>> > The problem: after running for a while, I get the following
> error: "internal queue at maximum capacity, could not queue event."
> >> >>> >
> >> >>> > I have reviewed the source code and understand when this happens.
> It happens when the processor tries to store an event in a java
> LinkedBlockingQueue and that queue reached its maximum capacity. The
> offer() method has a 100ms timeout in which it waits for space to free up
> and then it fails and the event gets dropped. In the logs I see exactly 10
> of these error messages per second (10 x 100ms is 1 second). Despite these
> errors, I still get a very good rate of events that get through to the next
> processors. Actually, it seems pretty much all of the other events get
> through since the message rate in ListenUDP and the followup processor are
> very much alike. The followup processors can easily handle the load and
> there are no full queues, congestions or anything like that.
> >> >>> >
> >> >>> > What I have tried so far:
> >> >>> >
> >> >>> > Increasing the "Max Size of Message Queue" setting helps, but
> only delays the errors. They eventually return.
> >> >>> >
> >> >>> > Increasing heap space is a suggestion I read from a past post: I
> think 24G is more than enough actually? Perhaps even too much?
> >> >>> >
> >> >>> > Increasing parallelism: concurrent tasks set to 5 or 10 does not
> help.
> >> >>> >
> >> >>> > I modified the code to use an ArrayBlockingQueue instead of the
> LinkedBlockingQueue, thinking it was some kind of garbage collection. This
> didn't help.
> >> >>> >
> >> >>> > I increased "Receive Buffer Size", "Max Size of Socket Buffer"
> but to no avail.
> >> >>> >
> >> >>> > I tried batching. This helps a bit, like increasing the "Max Size
> of Message Queue" it only seems to delay the eventual error messages though.
> >> >>> >
> >> >>> > I reproduced this on my local workstation. I installed nifi, did
> no OS tuning at all, set the heap size to 4GB. I generate 1.3M UDP packets
> per 5 minutes (the max I can reach with a simple python script). With "Max
> Size of Message Queue" set to only 100, soon the error appears. In the
> ListenUDP processor I see 1.34M events out, on the followup processor I see
> 1.34M events incoming. The error is not as frequent as on the cluster
> though, only a few every couple of minutes while the data rate is much
> higher and the queue much smaller. I'm a bit desperate and hope anyone can
> help me out. Why am I getting this error on a relatively quiet cluster with
> not that much load?
> >> >>> >
> >> >>> > Best regards,
> >> >>> > Erik-Jan van Baaren
>

Re: ListenUDP: internal queue at maximum capacity, could not queue event

Reply via email to