RE: DeleteHDFS behavior when idle

2023-06-12 Thread Isha Lamboo
Thanks Bryan and Mark,

I've created a Jira ticket describing things as well as I can. I see Bryan 
already has a better idea of the cause than my guess in the ticket, I will add 
that as a comment.

Regards,

Isha

-Oorspronkelijk bericht-
Van: Bryan Bende  
Verzonden: maandag 12 juni 2023 16:10
Aan: dev@nifi.apache.org
Onderwerp: Re: DeleteHDFS behavior when idle

The processor has @TriggerWhenEmpty so it is going to keep executing regardless 
of whether the incoming queue has data or not. I believe this was done early on 
for some processors that used Kerberos in order to allow the processor to have 
a chance to renew the Kerberos ticket, however we since moved away from need to 
do this, so unless there is another reason for having that, I would think it 
can be removed.

On Mon, Jun 12, 2023 at 9:25 AM Mark Payne  wrote:

> Isha,
>
> If you have an incoming connection, and you’re seeing this, then it’s 
> a bug. If there is no incoming connection and this processor is used 
> as a source processor, it’s normal. Either way, it has rather little 
> overhead, and you can further reduce the overhead by increasing the 
> Yield Duration in settings. This is how long it will wait between 
> invocations if there’s nothing for it to do.
>
> Either way, best to file a Jira, though, to address the behavior for 
> running unnecessarily when there’s an incoming Connection.
>
> Thanks
> -Mark
>
>
> > On Jun 12, 2023, at 8:36 AM, Isha Lamboo 
> > 
> wrote:
> >
> > Hi all,
> >
> > I have a question about behavior I see on one of our NiFi 1.18 
> > clusters
> that has a lot of xHDFS processors. When I look at the number of tasks 
> in the summary, the DeleteHDFS processors have a very high number 
> (800-1000+) of tasks even if they have nothing in their incoming 
> queues. The PutHDFS and FetchHDFS in contrast have no tasks listed 
> when they have no files in the incoming queues. Even though the tasks 
> take very little time (less than
> 100 millis per 5 mins), I’m wondering whether this causes problems 
> when the cluster is heavily loaded during peak hours.
> >
> > Is this a bug or some feature related to deleting files? Should I 
> > submit
> a ticket?
> >
> > Thanks,
> >
> > Isha
>
>


Re: DeleteHDFS behavior when idle

2023-06-12 Thread Bryan Bende
The processor has @TriggerWhenEmpty so it is going to keep executing
regardless of whether the incoming queue has data or not. I believe this
was done early on for some processors that used Kerberos in order to allow
the processor to have a chance to renew the Kerberos ticket, however we
since moved away from need to do this, so unless there is another reason
for having that, I would think it can be removed.

On Mon, Jun 12, 2023 at 9:25 AM Mark Payne  wrote:

> Isha,
>
> If you have an incoming connection, and you’re seeing this, then it’s a
> bug. If there is no incoming connection and this processor is used as a
> source processor, it’s normal. Either way, it has rather little overhead,
> and you can further reduce the overhead by increasing the Yield Duration in
> settings. This is how long it will wait between invocations if there’s
> nothing for it to do.
>
> Either way, best to file a Jira, though, to address the behavior for
> running unnecessarily when there’s an incoming Connection.
>
> Thanks
> -Mark
>
>
> > On Jun 12, 2023, at 8:36 AM, Isha Lamboo 
> wrote:
> >
> > Hi all,
> >
> > I have a question about behavior I see on one of our NiFi 1.18 clusters
> that has a lot of xHDFS processors. When I look at the number of tasks in
> the summary, the DeleteHDFS processors have a very high number (800-1000+)
> of tasks even if they have nothing in their incoming queues. The PutHDFS
> and FetchHDFS in contrast have no tasks listed when they have no files in
> the incoming queues. Even though the tasks take very little time (less than
> 100 millis per 5 mins), I’m wondering whether this causes problems when the
> cluster is heavily loaded during peak hours.
> >
> > Is this a bug or some feature related to deleting files? Should I submit
> a ticket?
> >
> > Thanks,
> >
> > Isha
>
>


Re: DeleteHDFS behavior when idle

2023-06-12 Thread Mark Payne
Isha,

If you have an incoming connection, and you’re seeing this, then it’s a bug. If 
there is no incoming connection and this processor is used as a source 
processor, it’s normal. Either way, it has rather little overhead, and you can 
further reduce the overhead by increasing the Yield Duration in settings. This 
is how long it will wait between invocations if there’s nothing for it to do.

Either way, best to file a Jira, though, to address the behavior for running 
unnecessarily when there’s an incoming Connection.

Thanks
-Mark


> On Jun 12, 2023, at 8:36 AM, Isha Lamboo  
> wrote:
> 
> Hi all,
> 
> I have a question about behavior I see on one of our NiFi 1.18 clusters that 
> has a lot of xHDFS processors. When I look at the number of tasks in the 
> summary, the DeleteHDFS processors have a very high number (800-1000+) of 
> tasks even if they have nothing in their incoming queues. The PutHDFS and 
> FetchHDFS in contrast have no tasks listed when they have no files in the 
> incoming queues. Even though the tasks take very little time (less than 100 
> millis per 5 mins), I’m wondering whether this causes problems when the 
> cluster is heavily loaded during peak hours.
> 
> Is this a bug or some feature related to deleting files? Should I submit a 
> ticket?
> 
> Thanks,
> 
> Isha



DeleteHDFS behavior when idle

2023-06-12 Thread Isha Lamboo
Hi all,

I have a question about behavior I see on one of our NiFi 1.18 clusters that 
has a lot of xHDFS processors. When I look at the number of tasks in the 
summary, the DeleteHDFS processors have a very high number (800-1000+) of tasks 
even if they have nothing in their incoming queues. The PutHDFS and FetchHDFS 
in contrast have no tasks listed when they have no files in the incoming 
queues. Even though the tasks take very little time (less than 100 millis per 5 
mins), I’m wondering whether this causes problems when the cluster is heavily 
loaded during peak hours.

Is this a bug or some feature related to deleting files? Should I submit a 
ticket?

Thanks,

Isha