Re: [DISCUSS] NIFI-1069 / PR1093 - Return code for a NiFi not responding to ping

2016-10-17 Thread Michal Klempa
 I am the original contributor and I am ok with 4. Commit is here:
https://github.com/apache/nifi/pull/1093/commits/5f90cb714dc48264e0f863ce864ac41ddb93556c

And yes, we need at least some LSB complaince to manage NiFi using
Ansible :) otherwise, I have to check ps a | grep NiFi output to see
if NiFi is running or not. Thats bad.

On Sat, Oct 15, 2016 at 3:40 AM, Edgardo Vega  wrote:
> I would say go with 4.
>
> Ansible will see 1, 2, 3, 4, 69 as not running and do the correct thing.
> Puppet sees 0 vs non zero. I think If he service is up running and
> responding to pings return 0 anything else should return another code. This
> will allow these tools to restart the application to get them back into a
> good state.
>
> Not sure what would put nifi into this state maybe disk full.
>
> Cheers,
>
> Edgardo
>
>
> On Friday, October 14, 2016, Andre  > wrote:
>
>> devs,
>>
>> I am reviewing PR#1093, which happens to be a great contribution towards a
>> LSB compliant NiFi (something the overall community seems to be eager to
>> have).
>>
>> The PR basically changes RunNiFi.java so that it returns a numeric exit
>> code compatible with the LSB specifications.
>>
>> I am happy with the overall code but there's one sticking point:
>>
>> Should we return 0 (i.e. "healthy") when "Apache NiFi is running at PID {}
>> but is not responding to ping requests" ?
>>
>> The LSB defines:
>>
>> "
>> If the status action is requested, the init script will return the
>> following exit status codes.
>>
>> 0 program is running or service is OK
>> 1 program is dead and /var/run pid file exists
>> 2 program is dead and /var/lock lock file exists
>> 3 program is not running
>> 4 program or service status is unknown
>> 5-99 reserved for future LSB use
>> 100-149 reserved for distribution use
>> 150-199 reserved for application use
>> 200-254 reserved
>> "
>>
>> My reading is that we should return 4, for the JVM PID is currently
>> running, however, the absence of a ping response could signal the NiFi
>> program running within the JVM is not healthy. (the PR contribution returns
>> 0).
>>
>> Would anyone have a view on what usually would cause a NiFi instance to be
>> "running" but unable to respond to pings? Whenever that happens should we
>> return 0 (running/service ok) or 4 (program/service status unknown)?
>>
>> I thank you in advance
>>
>
>
> --
> Cheers,
>
> Edgardo
>
> Sent from Gmail Mobile


Re: [DISCUSS] NIFI-1069 / PR1093 - Return code for a NiFi not responding to ping

2016-10-14 Thread Mark Payne
Andre,

In that case, I agree with you that a 4 would be the proper response. Things 
that I can
think of that may cause it not to respond:

1) Long Garbage Collection pause
2) Stuck in some sort of infinite loop or just way overtaxed CPU
3) Too many open files prevents it from accepting the connection

Not sure what else may cause this...

Thanks
-Mark

> On Oct 14, 2016, at 9:08 PM, Andre  wrote:
> 
> devs,
> 
> I am reviewing PR#1093, which happens to be a great contribution towards a
> LSB compliant NiFi (something the overall community seems to be eager to
> have).
> 
> The PR basically changes RunNiFi.java so that it returns a numeric exit
> code compatible with the LSB specifications.
> 
> I am happy with the overall code but there's one sticking point:
> 
> Should we return 0 (i.e. "healthy") when "Apache NiFi is running at PID {}
> but is not responding to ping requests" ?
> 
> The LSB defines:
> 
> "
> If the status action is requested, the init script will return the
> following exit status codes.
> 
> 0 program is running or service is OK
> 1 program is dead and /var/run pid file exists
> 2 program is dead and /var/lock lock file exists
> 3 program is not running
> 4 program or service status is unknown
> 5-99 reserved for future LSB use
> 100-149 reserved for distribution use
> 150-199 reserved for application use
> 200-254 reserved
> "
> 
> My reading is that we should return 4, for the JVM PID is currently
> running, however, the absence of a ping response could signal the NiFi
> program running within the JVM is not healthy. (the PR contribution returns
> 0).
> 
> Would anyone have a view on what usually would cause a NiFi instance to be
> "running" but unable to respond to pings? Whenever that happens should we
> return 0 (running/service ok) or 4 (program/service status unknown)?
> 
> I thank you in advance