Cluster Warnings

2018-10-14 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Hello,

We're running a 4-node cluster on NiFi 1.7.1. The fourth node was added 
recently and as soon as we added the 4th node, we started seeing below warnings

Response time from NODE2 was slow for each of the last 3 requests made. To see 
more information about timing, enable DEBUG logging for 
org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator

Initially we though the problem was with the recent node added and cross 
checked all the configs on the box and everything seemed to be just fine. After 
enabling the DEBUG mode for cluster logging we noticed that the warning is not 
specific to any node and every-time we see a warning like above there is one 
slow node which takes forever to send a response like below (in this case the 
slow node is NIFI04). Sometimes these will lead to node-disconnects needing a 
manual intervention.

DEBUG [Replicate Request Thread-50] o.a.n.c.c.h.r.ThreadPoolRequestReplicator 
Node Responses for GET /nifi-api/site-to-site (Request ID 
b2c6e983-5233-4007-bd54-13d21b7068d5):
NIFI04:8443: 1386 millis
NIFI02:8443: 3 millis
NIFI01:8443: 5 millis
NIFI03:8443: 3 millis
DEBUG [Replicate Request Thread-41] o.a.n.c.c.h.r.ThreadPoolRequestReplicator 
Node Responses for GET /nifi-api/site-to-site (Request ID 
d182fdab-f1d4-4ac9-97fd-e24c41dc4622):
NIFI04:8443: 1143 millis
NIFI02:8443: 22 millis
NIFI01:8443: 3 millis
NIFI03:8443: 2 millis
DEBUG [Replicate Request Thread-31] o.a.n.c.c.h.r.ThreadPoolRequestReplicator 
Node Responses for GET /nifi-api/site-to-site (Request ID 
e4726027-27c7-4bbb-8ab6-d02bb41f1920):
NIFI04:8443: 1053 millis
NIFI02:8443: 3 millis
NIFI01:8443: 3 millis
NIFI03:8443: 2 millis

We tried changing the configurations in nifi.properties like bumping up the 
"nifi.cluster.node.protocol.max.threads" but none of them seems to be working 
and we're still stuck with the slow communication between the nodes. We use an 
external zookeeper as this is our production server.
Below are some of our configs

# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=fslhdppnifi01.imfs.micron.com
nifi.cluster.node.protocol.port=11443
nifi.cluster.node.protocol.threads=100
nifi.cluster.node.protocol.max.threads=120
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=90 sec
nifi.cluster.node.read.timeout=90 sec
nifi.cluster.node.max.concurrent.requests=1000
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=30 sec
nifi.cluster.flow.election.max.candidates=

Any thoughts on why this is happening?


-Karthik


RE: [EXT] Re: Add interrupt option for stopped processors with active threads

2018-03-22 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]

Also on the same note, is there a way to get the underlying PID (unix) for a 
Processor? I know we can get the PID for the NiFi itself and get the child 
processes for that. But is there a way we can correlate them back to actual 
NiFi Processors ? 



-Original Message-
From: Matt Gilman [mailto:matt.c.gil...@gmail.com] 
Sent: Thursday, March 22, 2018 11:46 AM
To: dev@nifi.apache.org
Subject: [EXT] Re: Add interrupt option for stopped processors with active 
threads

There is a PR available for the backend work [1]. It is actively being 
reviewed. Following that, there is additional work to make the front-end 
changes [2]. It doesn't look like it's going to make it into 1.6.0.
However, it should be in scope for 1.7.0 assuming sufficient review traction 
for both efforts.

Matt

[1] https://github.com/apache/nifi/pull/2555
[2] https://issues.apache.org/jira/browse/NIFI-1295

On Thu, Mar 22, 2018 at 1:37 PM, Karthik Kothareddy (karthikk) [CONT - Type 2] 
<karth...@micron.com> wrote:

> Hello,
>
> Is there an ETA on the below Improvement? Or is it in the scope for 
> any future releases?
>
> https://issues.apache.org/jira/browse/NIFI-78
>
> Thanks
> Karthik
>


RE: [EXT] Re: Add interrupt option for stopped processors with active threads

2018-03-22 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Thanks Matt, will keep an eye on it.

-Original Message-
From: Matt Gilman [mailto:matt.c.gil...@gmail.com] 
Sent: Thursday, March 22, 2018 11:46 AM
To: dev@nifi.apache.org
Subject: [EXT] Re: Add interrupt option for stopped processors with active 
threads

There is a PR available for the backend work [1]. It is actively being 
reviewed. Following that, there is additional work to make the front-end 
changes [2]. It doesn't look like it's going to make it into 1.6.0.
However, it should be in scope for 1.7.0 assuming sufficient review traction 
for both efforts.

Matt

[1] https://github.com/apache/nifi/pull/2555
[2] https://issues.apache.org/jira/browse/NIFI-1295

On Thu, Mar 22, 2018 at 1:37 PM, Karthik Kothareddy (karthikk) [CONT - Type 2] 
<karth...@micron.com> wrote:

> Hello,
>
> Is there an ETA on the below Improvement? Or is it in the scope for 
> any future releases?
>
> https://issues.apache.org/jira/browse/NIFI-78
>
> Thanks
> Karthik
>


Add interrupt option for stopped processors with active threads

2018-03-22 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Hello,

Is there an ETA on the below Improvement? Or is it in the scope for any future 
releases?

https://issues.apache.org/jira/browse/NIFI-78

Thanks
Karthik


RE: [EXT] Re: Double click for failure queues

2018-02-01 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
I mean on the connection label itself not on the path which bends it. 
Specifically, I observed this behavior on the relationships that are routed 
back to the same processor.

 
-Original Message-
From: Joe Witt [mailto:joe.w...@gmail.com] 
Sent: Thursday, February 01, 2018 12:46 PM
To: dev@nifi.apache.org
Subject: [EXT] Re: Double click for failure queues

There is no notion of a 'failure queue' versus any other queue.
Processors have named relationships that once connected to another thing forms 
a connection.  They have no special meaning.

So, you're saying that some connections you double click bring up the dialogue 
and others do not respond to a double click?

On Thu, Feb 1, 2018 at 2:02 PM, Karthik Kothareddy (karthikk) [CONT - Type 2] 
<karth...@micron.com> wrote:
> All,
>
> I am using 1.4.0 and I see that double-click to configure (for both queues 
> and processors) feature as very useful and intuitive to use. However, if I 
> double click on a queue which has a failure relationship it doesn't work as 
> expected and I have to right-click to configure it. Is this by design? Or a 
> bug in UI?
>
> Just curious...
>
> Thanks
> Karthik


Double click for failure queues

2018-02-01 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
All,

I am using 1.4.0 and I see that double-click to configure (for both queues and 
processors) feature as very useful and intuitive to use. However, if I double 
click on a queue which has a failure relationship it doesn't work as expected 
and I have to right-click to configure it. Is this by design? Or a bug in UI?

Just curious...

Thanks
Karthik


RE: [EXT] Re: processorLoadAverage In SystemDiagnostics

2018-01-09 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Mark,

Thanks for quick response, Is there a way that NiFi provides these metrics 
individually for a processor that the average load ? Or can we use any REST 
endpoint to grab the processor-wise metrics?

-Karthik

-Original Message-
From: Mark Payne [mailto:marka...@hotmail.com] 
Sent: Tuesday, January 09, 2018 9:22 AM
To: dev@nifi.apache.org
Subject: [EXT] Re: processorLoadAverage In SystemDiagnostics 

Hi Karthik,

This is the 1-minute CPU load average, as reported by the operating system.

Thanks
-Mark


> On Jan 9, 2018, at 11:18 AM, Karthik Kothareddy (karthikk) [CONT - Type 2] 
> <karth...@micron.com> wrote:
> 
> Hello All,
> 
> I was going through SystemDiagnostics json for populating custom metrics for 
> our NiFi instances and came across the fields  availableProcessors and 
> processorLoadAverage in the Json. From the documentation I understood that it 
> is the underlying hardware's processor (number of cores) count. But for the 
> processorLoadAverage, I'm not yet clear on how this metric is calculated by 
> NiFi and what factors it considers in doing so. Can anyone please explain me 
> how this is measured and is it the average for just one core (one processor) 
> or is it aggregation on all available cores?
> 
> 
> -Karthik



processorLoadAverage In SystemDiagnostics

2018-01-09 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Hello All,

I was going through SystemDiagnostics json for populating custom metrics for 
our NiFi instances and came across the fields  availableProcessors and 
processorLoadAverage in the Json. From the documentation I understood that it 
is the underlying hardware's processor (number of cores) count. But for the 
processorLoadAverage, I'm not yet clear on how this metric is calculated by 
NiFi and what factors it considers in doing so. Can anyone please explain me 
how this is measured and is it the average for just one core (one processor) or 
is it aggregation on all available cores?


-Karthik


Enforcing order across multiple nodes

2018-01-04 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Hello,

I have a use case that goes as follows,

InputPort (Flat files ) --> ProcessFlowfiles (combination of ExecuteScripts to 
filter out few fields) --> Load to Teradata(using a custom processor)

To do so I have to be sure that I process the flowfiles in certain order across 
the nodes so that the individual transactions are correct ( ex. An update 
statement executing before there is an Insert statement  for that record ). I 
receive all the flat files via remote NiFi instance meaning they all will be 
distributed across nodes and there is no way for me to know in what order the 
files are processing. Has anyone encountered similar problem before and know a 
way out of this scenario?

Also, I came across this JIRA ( https://issues.apache.org/jira/browse/NIFI-4155 
). I see that there is a patch available for this already but don't see a 
release Version tied to this. Is there a plan to include this patch as a part 
of 1.5.0 or any future releases?

Another question, apart from what Koji Kawamura mentioned in the comment 
section for the above JIRA to use
EnforceOrder --> Wait to block only 1 FlowFile can go through --> Processors 
required to run serially --> Notify to release the latch

Is there any other way to enforce order across multiple nodes?

Thanks
Karthik


NiFi Clustering

2017-12-20 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
All,

We are trying to setup a new NiFi (1.4.0) cluster with 3 nodes, and trying to 
decide on how should we proceed with the repository structure. Should we be 
pointing the same repositories for all nodes (shared filers accessible by all 3 
nodes) or give separate local repositories for all three nodes and let each 
node has its own data copy. Is there any recommendation for this for better 
performance and throughput?

Thanks for your time.

-Karthik


NiFi Slowness after thousand processors

2017-11-03 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Hello All,

We are currently running NiFi 1.3.0 on Linux(RHEL) box (standalone instance), 
we are facing some strange issues with this instance. Whenever the total 
processor count exceeds 1500 the whole instance is slowing down (I think there 
is no limit on the number of processors an instance can have as of I know). The 
UI becomes unresponsive and slows down to a point where navigating to a certain 
Processor Group takes up to 10-15 secs. I cross-verified this behavior by 
making REST calls to see if it's a UI issue and found the same behavior there. 
System diagnostics takes upto 15-30 seconds and flowStatus takes also takes 
20-30 seconds to return results. At this point all the feeds start to slow down 
as well.

Strangely another machine with same configuration and same processor count (a 
mirror instance) is performing good. I checked all the metrics like 
system-diagnostics, maximum thread count, repository usage etc. but everything 
is under normal usage. Also the hardware and the usage on the underlying 
machine checks out, nothing suspicious there. Can anyone please suggest what 
might have been the root cause for this? Am I missing anything basic in setting 
up an instance that can run tens of thousands of processors without any issue?  
Below are the hardware specs for the machine.

Cores - 48
Memory - 800 GB
Disk Space - 2.8 TB one physical partition for logs, content, flowFile 
repositories (RAID5 ssd's)

Any help around this will be much appreciated, Thanks for your time on this.

-Karthik




RE: [EXT] Re: Correlate Processor ID in Logs

2017-08-22 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Pierre and Kevin,

Thanks for your suggestions, based on your inputs maybe I can build a hybrid 
monitoring system which uses both SiteToSite Reporting Task and Bulletins 
through REST calls.

-Karthik

-Original Message-
From: Pierre Villard [mailto:pierre.villard...@gmail.com] 
Sent: Tuesday, August 22, 2017 2:48 PM
To: dev <dev@nifi.apache.org>
Subject: [EXT] Re: Correlate Processor ID in Logs

Hi,

I'd suggest to use the SiteToSite Bulletin Reporting Task as a way to monitor 
the bulletins generated by NiFi. If your reporting task is scheduled frequently 
enough, you shouldn't have any issue. Note that the "5 bulletins limit" is per 
processor.

Thanks!

2017-08-22 22:43 GMT+02:00 Kevin Doran <kdoran.apa...@gmail.com>:

> Hi Karthik,
>
> A processor's metadata, including its name and parent processor group 
> ID, are accessible via the NiFi REST API [1] via GET /processors/{id}, 
> which
> returns:
>
> {
> ...
> "component": {
> "id": "value",
> "parentGroupId": "value",
> "name": "value",
> "type": "value",
>... }
> }
>
> Of course, hitting the API for every log line doesn't scale, so one 
> approach would be to build a local cache of processorId -> 
> processorMetadata in whatever log line processing tool you are using, 
> and use the cache in order to enrich each log line with the fields you 
> require.
> You could build the cache lazily, i.e., start with an empty lookup 
> table, and if the processor ID is not in the cache, hit the REST API to look 
> it up.
>
> Regards,
> Kevin
>
> [1] https://nifi.apache.org/docs/nifi-docs/rest-api/
>
> On 8/22/17, 15:56, "Karthik Kothareddy (karthikk) [CONT - Type 2]" < 
> karth...@micron.com> wrote:
>
> Hello All,
>
> I am trying to build a monitoring mechanism for our flows and I'm 
> considering using the "nifi-app.log" as a primary source and filter 
> them based on the messages. However, I see that a particular message 
> only has Processor name and ID for example,
>
> ERROR [Timer-Driven Process Thread-36] 
> o.a.nifi.processors.standard.ExecuteSQL
> ExecuteSQL[id=015a1007-548f-1bf5-1836-e4e53164d184] Unable to execute 
> SQL select query SELECT * FROM table WHERE comp_datetime <= 
> '2017-01-31 23:59:59.813' ORDER BY datetime OFFSET 32400 ROWS 
> FETCH NEXT 100 ROWS ONLY for 
> StandardFlowFileRecord[uuid=fc425c66-b83d-46d2-94bc-
> 332e43345960,claim=StandardContentClaim [resourceClaim= 
> StandardResourceClaim[id=1499803802779-112000, container=default, 
> section=384], offset=265042, length=114613],offset=53992, 
> name=16290968101533439,size=167]
>
> Given the above Error message it is really hard to correlate the 
> ProcessorName/ID to the actual name of the Processor or it's parent 
> ProcessorGroup. Is there a way that I can correlate them easily?
>
> Also , I have considered using Bulletins as the source which is 
> more fine grained to the actual processor and ProcessorGroup it 
> belongs to but problem with this approach is the rest call only 
> returns 5 bulletins back each time. And according to this post 
> https://community.hortonworks.
> com/questions/72411/nifi-bulletinrepository-api-
> returns-maximum-5-bull.html  it is a fixed value and practically not 
> feasible to capture all of them if the flow has multiple failures 
> every second.
>
>
> Any thoughts around this are much appreciated.
>
> Thanks
> Karthik
>
>
>
>


Correlate Processor ID in Logs

2017-08-22 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Hello All,

I am trying to build a monitoring mechanism for our flows and I'm considering 
using the "nifi-app.log" as a primary source and filter them based on the 
messages. However, I see that a particular message only has Processor name and 
ID for example,

ERROR [Timer-Driven Process Thread-36] o.a.nifi.processors.standard.ExecuteSQL 
ExecuteSQL[id=015a1007-548f-1bf5-1836-e4e53164d184] Unable to execute SQL 
select query SELECT * FROM table WHERE comp_datetime <= '2017-01-31 
23:59:59.813' ORDER BY datetime OFFSET 32400 ROWS FETCH NEXT 100 ROWS 
ONLY for 
StandardFlowFileRecord[uuid=fc425c66-b83d-46d2-94bc-332e43345960,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1499803802779-112000, 
container=default, section=384], offset=265042, 
length=114613],offset=53992,name=16290968101533439,size=167]

Given the above Error message it is really hard to correlate the 
ProcessorName/ID to the actual name of the Processor or it's parent 
ProcessorGroup. Is there a way that I can correlate them easily?

Also , I have considered using Bulletins as the source which is more fine 
grained to the actual processor and ProcessorGroup it belongs to but problem 
with this approach is the rest call only returns 5 bulletins back each time. 
And according to this post 
https://community.hortonworks.com/questions/72411/nifi-bulletinrepository-api-returns-maximum-5-bull.html
  it is a fixed value and practically not feasible to capture all of them if 
the flow has multiple failures every second.


Any thoughts around this are much appreciated.

Thanks
Karthik


RE: [EXT] Re: Updating users through Rest API

2017-08-09 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Matt,

Sorry I forgot to update the  community on this. I tried what you suggested and 
it worked like magic. So the right way to do it is 

1. Create a user first and get his UID, do not give him a UID and give Version: 
0 (POST)
2. Get a UserGroup (json format). (GET)
3. Add the new user using UserGroup json returned from Step-2. Add UID and 
respective version, permissions to the Json. (PUT)

I hope the procedure is similar for the access policies.

Thanks
Karthik

-Original Message-
From: Matt Gilman [mailto:matt.c.gil...@gmail.com] 
Sent: Friday, August 04, 2017 10:31 AM
To: dev@nifi.apache.org
Subject: [EXT] Re: Updating users through Rest API

Karthik,

Group membership is managed through the group. So you would need to update the 
group by adding the user identifier to the users list and 'PUT' that.
To see these requests in action, I would suggest opening the Developer Tools of 
your browser as the UI uses the REST API exclusively.

Please let me know if you have any follow-up questions.

Thanks

Matt

On Fri, Aug 4, 2017 at 11:50 AM, Karthik Kothareddy (karthikk) [CONT - Type 2] 
<karth...@micron.com> wrote:

> Hello All,
>
> I am trying to add/update users through REST API, I am using 
> InvokeHTTP to do that, I tried simple addition of user with the below 
> Json and it worked perfectly.
>
> {
>   "revision" : {
> "version" : 0
>   },
>   "permissions" : {
> "canRead" : true,
> "canWrite" : false
>   },
>   "component" : {
> "identity" : "testuser"
>   }
> }
>
> However, once the user is added I am trying to add him to the user 
> groups that I have for my instance. I'm using the json that I got by 
> querying a different user by (/tenants/users/{id}). I have updated all 
> the UID in the returned json to match the other user and use this PUT 
> /tenants/users/{id} for the update. It doesn't seem to have any effect 
> on the "testuser", it still says he does not belong any group. Can 
> anyone help me with some examples on how to effectively add/update users.
>
> Thanks for your time,
>
> -Karthik
>


Updating users through Rest API

2017-08-04 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Hello All,

I am trying to add/update users through REST API, I am using InvokeHTTP to do 
that, I tried simple addition of user with the below Json and it worked 
perfectly.

{
  "revision" : {
"version" : 0
  },
  "permissions" : {
"canRead" : true,
"canWrite" : false
  },
  "component" : {
"identity" : "testuser"
  }
}

However, once the user is added I am trying to add him to the user groups that 
I have for my instance. I'm using the json that I got by querying a different 
user by (/tenants/users/{id}). I have updated all the UID in the returned json 
to match the other user and use this PUT /tenants/users/{id} for the update. It 
doesn't seem to have any effect on the "testuser", it still says he does not 
belong any group. Can anyone help me with some examples on how to effectively 
add/update users.

Thanks for your time,

-Karthik


UI Not Responsive

2017-07-05 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
All,

I am currently running NiFi 1.2.0 on a Linux(RHEL) machine. Everything was 
running fine until yesterday where it started behaving weird. The UI is not at 
all responsive and sometimes the page wouldn't even load. I bumped up the Java 
Heap Space just to make sure I'm not overloading the system to a point where I 
cannot load the UI. I looked at the nifi-bootstrap.log and it says the instance 
is running

INFO [main] org.apache.nifi.bootstrap.Command Apache NiFi is currently running, 
listening to Bootstrap on port 54224, PID=41602

However, in the nifi-app.log I find something interesting, the trace is as 
below and I keep getting this warning almost every 5-10 seconds in the log

2017-07-05 20:00:02,032 WARN [NiFi Web 
Server-16-acceptor-0@6fecd17b-ServerConnector@a852{SSL,[ssl, 
http/1.1]}{server-name:8443}] o.eclipse.jetty.server.AbstractConnector
java.nio.channels.ClosedSelectorException: null
   at sun.nio.ch.SelectorImpl.keys(SelectorImpl.java:68) 
~[na:1.8.0_101]
   at 
org.eclipse.jetty.io.ManagedSelector.size(ManagedSelector.java:104) 
~[jetty-io-9.3.9.v20160517.jar:9.3.9.v20160517]
   at 
org.eclipse.jetty.io.SelectorManager.chooseSelector(SelectorManager.java:190) 
~[jetty-io-9.3.9.v20160517.jar:9.3.9.v20160517]
   at 
org.eclipse.jetty.io.SelectorManager.accept(SelectorManager.java:232) 
~[jetty-io-9.3.9.v20160517.jar:9.3.9.v20160517]
   at 
org.eclipse.jetty.io.SelectorManager.accept(SelectorManager.java:217) 
~[jetty-io-9.3.9.v20160517.jar:9.3.9.v20160517]
   at 
org.eclipse.jetty.server.ServerConnector.accepted(ServerConnector.java:383) 
~[jetty-server-9.3.9.v20160517.jar:9.3.9.v20160517]
   at 
org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:374) 
~[jetty-server-9.3.9.v20160517.jar:9.3.9.v20160517]
   at 
org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:593)
 ~[jetty-server-9.3.9.v20160517.jar:9.3.9.v20160517]
   at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
 [jetty-util-9.3.9.v20160517.jar:9.3.9.v20160517]
   at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) 
[jetty-util-9.3.9.v20160517.jar:9.3.9.v20160517]
   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]

Does anyone have similar problems in launching the UI? any help around this is 
appreciated.


Thanks
Karthik






RE: [EXT] Re: NiFi Throughput and Slowness

2017-07-03 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Joe,

Thanks for your inputs, I will for sure try all these before I start clustering 
the instance. I will keep the community updated on my results.

Thanks
Karthik

-Original Message-
From: Joe Witt [mailto:joe.w...@gmail.com] 
Sent: Monday, July 03, 2017 12:30 PM
To: dev@nifi.apache.org
Subject: RE: [EXT] Re: NiFi Throughput and Slowness

Karthik

It is really important to follow best practice configuration for placement of 
repos to your underlying storage.  Configured well you can have hundreds of 
MB/s sustained throughput per node.

Also be sure to take advantage of the record reader / writer capability if 
appropriate for your flow.  Configured well hundreds you can achieve of 
thousands of records per second through a series of enrichments, SQL based 
queries, with transformation all with schema and format awareness while many 
other flows happen at once.

Also, while the ram is awesome that you have the JVM might not be able to take 
advantage of that well in a garbage collection friendly manner.
Consider dialing that way down to say 8GB.

If you have split text in there be sure it isn't splitting 10s of thousands or 
more records at once.  You can do two phase splits and see much better behavior.

With that hardware sustained performance can be extremely high.  I'd say do a 
few raid1 instead of raid5.  You can then partition the various repositories of 
nifi to minimize the use of the same physical device and
maximize throughput and response time.   You'll also prob need 10Gb
NICs/network.

And clustering is a powerful feature.  I'd avoid doing that until you have the 
right fundamentals at play in a single node and and see both the sustained 
throughout and transaction rate you'd expect.

Thanks
Joe





On Jul 3, 2017 1:18 PM, "Karthik Kothareddy (karthikk) [CONT - Type 2]" < 
karth...@micron.com> wrote:

Rick,

Thanks a lot for the suggestion, clustering is something that even I was 
thinking of for a long time. Just wanted to see if anyone in the community have 
similar problems and solutions they found.

-Karthik

-Original Message-
From: Richard St. John [mailto:rstjoh...@gmail.com]
Sent: Monday, July 03, 2017 10:54 AM
To: dev@nifi.apache.org; dev@nifi.apache.org
Subject: [EXT] Re: NiFi Throughput and Slowness

Hi there,

In the beginning of our NiFi adoption, we faced similar issues. For us, we 
clustered NiFi, limited the number of concurrent tasks for each processor and 
added more logical partitions for content and provenance repositories.
Now, we easily processor million of flow files per minute on a 5-node cluster 
with hundreds of processors in the data flow pipeline. When we need to ingest 
more data or process it faster, we simply add more nodes.

First and foremost, clustering NiFi allows horizontal scaling: a must. It seems 
counterintuitive, but limiting the number of concurrent tasks was a major 
performance improvement. Doing so keeps the flow "balanced", preventing 
hotspots within the flow pipeline.

I hope this helps

Rick.

--
Richard St. John, PhD
Asymmetrik
141 National Business Pkwy, Suite 110
Annapolis Junction, MD 20701

On Jul 3, 2017, 12:53 PM -0400, Karthik Kothareddy (karthikk) [CONT - Type 2] 
<karth...@micron.com>, wrote:
> All,
>
> I am currently using NiFi 1.2.0 on a Linux (RHEL) machine. I am using 
> a
single instance without any clustering. My machine has ~800GB of RAM and
2.5 TB of disk space (SSD’s with RAID5). I have set my Java heap space values 
to below in “bootstrap.conf” file
>
> # JVM memory settings
> java.arg.2=-Xms40960m
> java.arg.3=-Xmx81920m
>
> # Some custom Configurations
> java.arg.7=-XX:ReservedCodeCacheSize=1024m
> java.arg.8=-XX:CodeCacheMinimumFreeSpace=10m
> java.arg.9=-XX:+UseCodeCacheFlushing
>
> Now, the problem that I am facing when I am stress testing this 
> instance
is whenever the Read/Write of Data feeds reach the limit of 5GB (at least 
that’s what I observed) the whole instance is running super slow meaning the 
flowfiles are moving very slow in the queues. It is heavily affecting the other 
Processor groups as well which are very simple flows. I tied to read the system 
diagnostics at that point and see that all the usage is below 20% including 
heap Usage, flowFile and content repository usage. I tried to capture the 
status history of the Process Group at that particular point and below are some 
results.
>
>
>
>
>
>
>
>
>
> From the above images it is obvious that the process group is working 
> on
lot of IO at that point. Is there a way to increase the throughput of the 
instance given my requirement which has tons of read/writes every hour.
Also to add all my repositories (flowfile , content and provenance) are on the 
same disk. I tried to increase all the memory settings I possibly can in both 
bootstrap.conf and nifi.properties , but no use the whole instance is running 
very s

RE: [EXT] Re: NiFi Throughput and Slowness

2017-07-03 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Rick,

Thanks a lot for the suggestion, clustering is something that even I was 
thinking of for a long time. Just wanted to see if anyone in the community have 
similar problems and solutions they found. 

-Karthik

-Original Message-
From: Richard St. John [mailto:rstjoh...@gmail.com] 
Sent: Monday, July 03, 2017 10:54 AM
To: dev@nifi.apache.org; dev@nifi.apache.org
Subject: [EXT] Re: NiFi Throughput and Slowness

Hi there,

In the beginning of our NiFi adoption, we faced similar issues. For us, we 
clustered NiFi, limited the number of concurrent tasks for each processor and 
added more logical partitions for content and provenance repositories. Now, we 
easily processor million of flow files per minute on a 5-node cluster with 
hundreds of processors in the data flow pipeline. When we need to ingest more 
data or process it faster, we simply add more nodes.

First and foremost, clustering NiFi allows horizontal scaling: a must. It seems 
counterintuitive, but limiting the number of concurrent tasks was a major 
performance improvement. Doing so keeps the flow "balanced", preventing 
hotspots within the flow pipeline.

I hope this helps

Rick.

--
Richard St. John, PhD
Asymmetrik
141 National Business Pkwy, Suite 110
Annapolis Junction, MD 20701

On Jul 3, 2017, 12:53 PM -0400, Karthik Kothareddy (karthikk) [CONT - Type 2] 
<karth...@micron.com>, wrote:
> All,
>
> I am currently using NiFi 1.2.0 on a Linux (RHEL) machine. I am using a 
> single instance without any clustering. My machine has ~800GB of RAM and 2.5 
> TB of disk space (SSD’s with RAID5). I have set my Java heap space values to 
> below in “bootstrap.conf” file
>
> # JVM memory settings
> java.arg.2=-Xms40960m
> java.arg.3=-Xmx81920m
>
> # Some custom Configurations
> java.arg.7=-XX:ReservedCodeCacheSize=1024m
> java.arg.8=-XX:CodeCacheMinimumFreeSpace=10m
> java.arg.9=-XX:+UseCodeCacheFlushing
>
> Now, the problem that I am facing when I am stress testing this instance is 
> whenever the Read/Write of Data feeds reach the limit of 5GB (at least that’s 
> what I observed) the whole instance is running super slow meaning the 
> flowfiles are moving very slow in the queues. It is heavily affecting the 
> other Processor groups as well which are very simple flows. I tied to read 
> the system diagnostics at that point and see that all the usage is below 20% 
> including heap Usage, flowFile and content repository usage. I tried to 
> capture the status history of the Process Group at that particular point and 
> below are some results.
>
>
>
>
>
>
>
>
>
> From the above images it is obvious that the process group is working on lot 
> of IO at that point. Is there a way to increase the throughput of the 
> instance given my requirement which has tons of read/writes every hour. Also 
> to add all my repositories (flowfile , content and provenance) are on the 
> same disk. I tried to increase all the memory settings I possibly can in both 
> bootstrap.conf and nifi.properties , but no use the whole instance is running 
> very slow and is processing minimum amount of flowfiles. Just to make sure I 
> created a GenerateFlowfile processor when the system is slow and to my 
> surprise the rate of flow files generated is less that one per minute (which 
> should fill the queue in less than 5 secs under normal circumstances). Any 
> help on this would be much appreciated.
>
>
> Thanks
> Karthik
>
>
>
>
>
>
>
>
>
>
>


NiFi Throughput and Slowness

2017-07-03 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
All,

I am currently using NiFi 1.2.0 on a Linux (RHEL) machine. I am using a single 
instance without any clustering. My machine has ~800GB of RAM and 2.5 TB of 
disk space (SSD's with RAID5). I have set my Java heap space values to below in 
"bootstrap.conf" file

# JVM memory settings
java.arg.2=-Xms40960m
java.arg.3=-Xmx81920m

# Some custom Configurations
java.arg.7=-XX:ReservedCodeCacheSize=1024m
java.arg.8=-XX:CodeCacheMinimumFreeSpace=10m
java.arg.9=-XX:+UseCodeCacheFlushing

Now, the problem that I am facing when I am stress testing this instance is 
whenever the Read/Write of Data feeds reach the limit of 5GB (at least that's 
what I observed) the whole instance is running super slow meaning the flowfiles 
are moving very slow in the queues. It is heavily affecting the other Processor 
groups as well which are very simple flows. I tied to read the system 
diagnostics at that point and see that all the usage is below 20% including 
heap Usage, flowFile and content repository usage. I tried to capture the 
status history of the Process Group at that particular point and below are some 
results.


[cid:image001.png@01D2F3EA.80145A40]



[cid:image002.png@01D2F3EA.80145A40]




>From the above images it is obvious that the process group is working on lot 
>of IO at that point. Is there a way to increase the throughput of the instance 
>given my requirement which has tons of read/writes every hour. Also to add all 
>my repositories (flowfile , content and provenance) are on the same disk. I 
>tried to increase all the memory settings I possibly can in both 
>bootstrap.conf and nifi.properties , but no use the whole instance is running 
>very slow and is processing minimum amount of flowfiles. Just to make sure I 
>created a GenerateFlowfile processor when the system is slow and to my 
>surprise the rate of flow files generated is less that one per minute (which 
>should fill the queue in less than 5 secs under normal circumstances). Any 
>help on this would be much appreciated.


Thanks
Karthik













RE: [EXT] Re: OverlappingFileLockException while restarting

2017-05-25 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
.jar:1.2.0-SNAPSHOT]
at 
org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:2680)
 ~[nifi-framework-core-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
... 14 common frames omitted
2017-05-25 16:28:52,651 WARN [NiFi Web Server-17] 
org.eclipse.jetty.servlet.ServletHandler 
/nifi-api/data-transfer/input-ports/401a3083-015c-1000-695b-fa269fc7432f/transactions/68e7b4ac-6b81-4dca-a530-6cbd2b314365/flow-files
java.nio.channels.ClosedChannelException: null
at 
org.eclipse.jetty.util.IteratingCallback.close(IteratingCallback.java:427) 
~[jetty-util-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.server.HttpConnection.onClose(HttpConnection.java:491) 
~[jetty-server-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.io.ssl.SslConnection.onClose(SslConnection.java:152) 
~[jetty-io-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.io.SelectorManager.connectionClosed(SelectorManager.java:345) 
~[jetty-io-9.3.9.v20160517.jar:9.3.9.v20160517]
at org.eclipse.jetty.io.ManagedSelector$2.run(ManagedSelector.java:442) 
~[jetty-io-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
 [jetty-util-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
 [jetty-util-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
 [jetty-util-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
 [jetty-util-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) 
[jetty-util-9.3.9.v20160517.jar:9.3.9.v20160517]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
2017-05-25 16:28:52,651 WARN [NiFi Web Server-17] 
org.eclipse.jetty.server.HttpChannel 
https://domain:8443/nifi-api/data-transfer/input-ports/401a3083-015c-1000-695b-fa269fc7432f/transactions/68e7b4ac-6b81-4dca-a530-6cbd2b314365/flow-files
java.nio.channels.ClosedChannelException: null
at 
org.eclipse.jetty.util.IteratingCallback.close(IteratingCallback.java:427) 
~[jetty-util-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.server.HttpConnection.onClose(HttpConnection.java:491) 
~[jetty-server-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.io.ssl.SslConnection.onClose(SslConnection.java:152) 
~[jetty-io-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.io.SelectorManager.connectionClosed(SelectorManager.java:345) 
~[jetty-io-9.3.9.v20160517.jar:9.3.9.v20160517]
at org.eclipse.jetty.io.ManagedSelector$2.run(ManagedSelector.java:442) 
~[jetty-io-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
 [jetty-util-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
 [jetty-util-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
 [jetty-util-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
 [jetty-util-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) 
[jetty-util-9.3.9.v20160517.jar:9.3.9.v20160517]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]

Below are the settings I've been using 

# Site to Site properties
nifi.remote.input.host=domain
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10443
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec

# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=
nifi.web.http.port=
nifi.web.https.host=domain
nifi.web.https.port=8443
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200


Thanks
Karthik



-Original Message-
From: Joe Witt [mailto:joe.w...@gmail.com] 
Sent: Thursday, May 25, 2017 10:14 AM
To: dev@nifi.apache.org
Subject: [EXT] Re: OverlappingFileLockException while restarting

most likely there is another instance of nifi still running.  Check ps -ef | 
grep nifi.  Kill that and try again.

Thanks

On Thu, May 25, 2017 at 12:10 PM, Karthik Kothareddy (karthikk) [CONT
- Type 2] <karth...@micron.com> wrote:
> Hello,
>
> I am running 1.2.0 Snapshot on a Linux instance with some custom 
> processors. I was trying to update a certificate this morning and 
> restart the instance and I'm stuck with the 
> "Over

OverlappingFileLockException while restarting

2017-05-25 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Hello,

I am running 1.2.0 Snapshot on a Linux instance with some custom processors. I 
was trying to update a certificate this morning and restart the instance and 
I'm stuck with the "OverlappingFileLockException". The instance would not start 
and soon after I start it ,it will shutdown with the following logs in 
bootstrap.log

2017-05-25 15:37:12,935 ERROR [NiFi logging handler] org.apache.nifi.StdErr 
Failed to start web server: Unable to start Flow Controller.
2017-05-25 15:37:12,936 ERROR [NiFi logging handler] org.apache.nifi.StdErr 
Shutting down...
2017-05-25 15:37:13,884 INFO [main] org.apache.nifi.bootstrap.RunNiFi NiFi 
never started. Will not restart NiFi

However, I was going through the app-log to make sure everything is okay and I 
found the below trace


WARN [Thread-1] org.apache.nifi.web.server.JettyServer Failed to stop web server
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'flowService': FactoryBean threw exception on object creation; nested 
exception is org.springframework.beans.factory.BeanCreationException: Error 
creating bean with name 'flowController': FactoryBean threw exception on object 
creation; nested exception is java.lang.RuntimeException: 
java.nio.channels.OverlappingFileLockException
at 
org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:175)
 ~[na:na]
at 
org.springframework.beans.factory.support.FactoryBeanRegistrySupport.getObjectFromFactoryBean(FactoryBeanRegistrySupport.java:103)
 ~[na:na]
at 
org.springframework.beans.factory.support.AbstractBeanFactory.getObjectForBeanInstance(AbstractBeanFactory.java:1585)
 ~[na:na]
at 
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:254)
 ~[na:na]
at 
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202)
 ~[na:na]
at 
org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1060)
 ~[na:na]
at 
org.apache.nifi.web.contextlistener.ApplicationStartupContextListener.contextDestroyed(ApplicationStartupContextListener.java:103)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.ContextHandler.callContextDestroyed(ContextHandler.java:845)
 ~[na:na]
at 
org.eclipse.jetty.servlet.ServletContextHandler.callContextDestroyed(ServletContextHandler.java:546)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.ContextHandler.stopContext(ContextHandler.java:826)
 ~[na:na]
at 
org.eclipse.jetty.servlet.ServletContextHandler.stopContext(ServletContextHandler.java:356)
 ~[na:na]
at 
org.eclipse.jetty.webapp.WebAppContext.stopWebapp(WebAppContext.java:1410) 
~[na:na]
at 
org.eclipse.jetty.webapp.WebAppContext.stopContext(WebAppContext.java:1374) 
~[na:na]
at 
org.eclipse.jetty.server.handler.ContextHandler.doStop(ContextHandler.java:874) 
~[na:na]
at 
org.eclipse.jetty.servlet.ServletContextHandler.doStop(ServletContextHandler.java:272)
 ~[na:na]
at 
org.eclipse.jetty.webapp.WebAppContext.doStop(WebAppContext.java:544) ~[na:na]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStop(AbstractHandler.java:73)
 ~[na:na]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStop(AbstractHandler.java:73)
 ~[na:na]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStop(AbstractHandler.java:73)
 ~[na:na]
at org.eclipse.jetty.server.Server.doStop(Server.java:482) ~[na:na]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
 ~[na:na]
at org.apache.nifi.web.server.JettyServer.stop(JettyServer.java:854) 
~[na:na]
at org.apache.nifi.NiFi.shutdownHook(NiFi.java:188) 
[nifi-runtime-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at org.apache.nifi.NiFi$2.run(NiFi.java:89) 
[nifi-runtime-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at