Hello Witt,
before anything else thanks for your help.
Fortunatly I  put down only the NIFI cluster, otherwise I was already in
vacation :)

After I posted this problem I kept to torture staging NIFI and
discovered that when CPU LOAD gets very high, nodes loose connection and
anything starts going in the bad directory. Also the WEB GUI becomes not
responsive, you have no option to stop workflows.

You can reproduce this issue starting some workflows composed by
1) GenerateFlowFile ( 1 Kb size, Timer driven, 0 sec run schedule )
2) ReplaceText ( just to force the use of regexp )
3) HashContent, ( auto terminate both relationships )

Currently my staging cluster is composed by 2 virtual host configured as:
2 Core cpu ( Intel(R) Xeon(R) CPU E7- 2870  @ 2.40GHz )
2 GB RAM
18 GB HD

The problem raised when the CPU load goes over 8, this basically means
when you start 8 of the above WF.

I noticed NIFI attempts to reduce the load but this does not works too
much and does not avoid the general failure.

Here you can see the errors which started to show under stress:
https://drive.google.com/drive/folders/0B7NTMIqrCjESN0JURnRtZWp5Tms?usp=sharing


The 1st question is: is here a way to keep the load under some critical
values? Is there some "how to" which helps me to configure NIFI ?
Currently it is using the factory settings and no customization has been
performed but LDAP login.

AP



On 28/10/2016 13:24, Joe Witt wrote:
> Alessio
> 
> You have two clusters here potentially.  The NiFi cluster and the
> Hadoop cluster.  Which one went down?
> 
> If NiFi went down I'd suspect memory exhaustion issues because other
> resource exhaustion issues like full file system, exhausted file
> handles, pegged CPU, etc.. tend not to cause it to restart.  If memory
> related you'll probably see something in the nifi-app.log.  Try going
> with a larger heap as can be controlled in conf/bootstrap.conf.
> 
> Thanks
> Joe
> 
> On Fri, Oct 28, 2016 at 5:55 AM, Alessio Palma
> <alessio.pa...@buongiorno.com> wrote:
>> Hello all,
>> yesterday, for a mistake, basically I executed " ls -R / " using the
>> ListHDFS processor and the whole cluster gone down ( not just a node ).
>>
>> Something like this also happened when I was playing with some DO WHILE
>> / WHILE DO patterns. I have only the nifi logs and they show the
>> heartbeat has been lost. About the CPU LOAD, NETWORK TRAFFIC I have no
>> info. Any pointers about where do I have look for the problem's root ?
>>
>> Today I'm trying to repeat the problems I got with DO/WHILE, nothing bad
>> is happening although CPU LOAD is enough high and NETWORK  TRAFFIC
>> increased up to 282 Kb/sec.
>>
>> Of course I can redo the "ls -R /" on production, however I like to
>> avoid it since there are already some ingestion flows running.
>>
>> AP
> .
> 

Reply via email to