Re: Setting basic NiFi network problem

2017-05-25 Thread Andy LoPresto
Yes, you need to ensure that each node in the cluster (each instance of the 
application) is running its web server on a unique port if they are all running 
on the same machine. You’ll also need to pay attention to the S2S and cluster 
protocol ports.

Pierre Villard has written an excellent article with prescriptive steps for 
setting up a 3 node cluster (unsecured [1] and secured [2]) and I encourage you 
to follow those steps.

[1] https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/ 

[2] 
https://pierrevillard.com/2016/11/29/apache-nifi-1-1-0-secured-cluster-setup/ 



Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On May 25, 2017, at 1:54 PM, Xphos  wrote:
> 
> Okay so I know that this is certainly a duplicate post but i really need help
> setting this up.
> I have installed NiFi and compiled but I am getting stuck trying to setup
> the basic network.
> 
> So I am thinking I am misconfiguring the ports because I am trying to build
> a 3 node network with just localhost. What keeps happening every time i boot
> is that the nodes all turn on but then one by one disconnect until a
> perpetual election cycle happens (All my flow net works are blank)
> 
> So I am including my nifi.property configuration files at the bottom for the
> 3 nodes I am only including the basic data because i haven't touch anything
> else except setting the zookeeper embeddded property to true for all nodes.
> 
> Areas of concern:
> 1. When I was reading the documentation for setting up the zookeeper conf
> file it sets up server.1=hostname:2888:3888 so i just followed that example
> for every server now i didn't want to do that because i feel i should have
> different ports because i used local host but i don't know were it decided
> in need to use these ports so I did not think i could change them.
> 
> 2. web.http.port are all set to 8080 but this might be be incorrectly
> interpreting the docs.
> 
> 
> 
> *Node 1:*
> # Site to Site properties
> nifi.remote.input.host=localhost
> nifi.remote.input.secure=false
> nifi.remote.input.socket.port=8130
> nifi.remote.input.http.enabled=true
> nifi.remote.input.http.transaction.ttl=30 sec
> 
> # web properties #
> nifi.web.war.directory=./lib
> nifi.web.http.host=
> nifi.web.http.port=8080
> nifi.web.http.network.interface.default=
> nifi.web.https.host=
> nifi.web.https.port=
> nifi.web.https.network.interface.default=
> nifi.web.jetty.working.directory=./work/jetty
> nifi.web.jetty.threads=200
> 
> # cluster common properties (all nodes must have same values) #
> nifi.cluster.protocol.heartbeat.interval=5 sec
> nifi.cluster.protocol.is.secure=false
> 
> # cluster node properties (only configure for cluster nodes) #
> nifi.cluster.is.node=true
> nifi.cluster.node.address=
> nifi.cluster.node.protocol.port=8100
> nifi.cluster.node.protocol.threads=10
> nifi.cluster.node.event.history.size=25
> nifi.cluster.node.connection.timeout=5 sec
> nifi.cluster.node.read.timeout=5 sec
> nifi.cluster.firewall.file=
> nifi.cluster.flow.election.max.wait.time=1 mins
> nifi.cluster.flow.election.max.candidates=3
> 
> # zookeeper properties, used for cluster management #
> nifi.zookeeper.connect.string=localhost:2180,localhost:2190,localhost:2200
> nifi.zookeeper.connect.timeout=3 secs
> nifi.zookeeper.session.timeout=3 secs
> nifi.zookeeper.root.node=/nifi
> 
> 
> 
> 
> *Node 2:*
> # Site to Site properties
> nifi.remote.input.host=localhost
> nifi.remote.input.secure=false
> nifi.remote.input.socket.port=8120
> nifi.remote.input.http.enabled=true
> nifi.remote.input.http.transaction.ttl=30 sec
> 
> # web properties #
> nifi.web.war.directory=./lib
> nifi.web.http.host=
> nifi.web.http.port=8080
> nifi.web.http.network.interface.default=
> nifi.web.https.host=
> nifi.web.https.port=
> nifi.web.https.network.interface.default=
> nifi.web.jetty.working.directory=./work/jetty
> nifi.web.jetty.threads=200
> 
> # cluster common properties (all nodes must have same values) #
> nifi.cluster.protocol.heartbeat.interval=5 sec
> nifi.cluster.protocol.is.secure=false
> 
> # cluster node properties (only configure for cluster nodes) #
> nifi.cluster.is.node=true
> nifi.cluster.node.address=
> nifi.cluster.node.protocol.port=8090
> nifi.cluster.node.protocol.threads=10
> nifi.cluster.node.event.history.size=25
> nifi.cluster.node.connection.timeout=5 sec
> nifi.cluster.node.read.timeout=5 sec
> nifi.cluster.firewall.file=
> nifi.cluster.flow.election.max.wait.time=1 mins
> nifi.cluster.flow.election.max.candidates= 3
> 
> # zookeeper properties, used for cluster management #
> nifi.zookeeper.connect.string=localhost:2180,localhost:2190,localhost:2200
> nifi.zookeeper.connect.timeout=3 secs
> nifi.zookeeper.session.timeout=3 secs
> 

Setting basic NiFi network problem

2017-05-25 Thread Xphos
Okay so I know that this is certainly a duplicate post but i really need help
setting this up.
I have installed NiFi and compiled but I am getting stuck trying to setup
the basic network.

So I am thinking I am misconfiguring the ports because I am trying to build
a 3 node network with just localhost. What keeps happening every time i boot
is that the nodes all turn on but then one by one disconnect until a
perpetual election cycle happens (All my flow net works are blank)

So I am including my nifi.property configuration files at the bottom for the
3 nodes I am only including the basic data because i haven't touch anything
else except setting the zookeeper embeddded property to true for all nodes.

Areas of concern:
1. When I was reading the documentation for setting up the zookeeper conf
file it sets up server.1=hostname:2888:3888 so i just followed that example
for every server now i didn't want to do that because i feel i should have
different ports because i used local host but i don't know were it decided
in need to use these ports so I did not think i could change them.

2. web.http.port are all set to 8080 but this might be be incorrectly
interpreting the docs.



*Node 1:*
# Site to Site properties
nifi.remote.input.host=localhost
nifi.remote.input.secure=false
nifi.remote.input.socket.port=8130
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec

# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=
nifi.web.http.port=8080
nifi.web.http.network.interface.default=
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200

# cluster common properties (all nodes must have same values) #
nifi.cluster.protocol.heartbeat.interval=5 sec
nifi.cluster.protocol.is.secure=false

# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=
nifi.cluster.node.protocol.port=8100
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=1 mins
nifi.cluster.flow.election.max.candidates=3

# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=localhost:2180,localhost:2190,localhost:2200
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi




*Node 2:*
# Site to Site properties
nifi.remote.input.host=localhost
nifi.remote.input.secure=false
nifi.remote.input.socket.port=8120
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec

# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=
nifi.web.http.port=8080
nifi.web.http.network.interface.default=
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200

# cluster common properties (all nodes must have same values) #
nifi.cluster.protocol.heartbeat.interval=5 sec
nifi.cluster.protocol.is.secure=false

# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=
nifi.cluster.node.protocol.port=8090
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=1 mins
nifi.cluster.flow.election.max.candidates= 3

# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=localhost:2180,localhost:2190,localhost:2200
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi



*Node 3:*
# Site to Site properties
nifi.remote.input.host=localhost
nifi.remote.input.secure=false
nifi.remote.input.socket.port=8140
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec

# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=
nifi.web.http.port=8080
nifi.web.http.network.interface.default=
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200

# cluster common properties (all nodes must have same values) #
nifi.cluster.protocol.heartbeat.interval=5 sec
nifi.cluster.protocol.is.secure=false

# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=
nifi.cluster.node.protocol.port=8110
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=1 mins
nifi.cluster.flow.election.max.candidates=3

# zookeeper properties, used for cluster management #

RE: [EXT] Re: OverlappingFileLockException while restarting

2017-05-25 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Joe,

Looks like there was no other process running but the actual culprit was this 
property " nifi.remote.input.secure=false". I set the flag back to true and 
everything started working again and I am really confused by this vague error, 
right now I am facing a different issue when I'm trying to send data via site 
to site for the same instance , I keep getting the below trace.


ERROR [NiFi Web Server-646] o.a.nifi.web.api.ApplicationResource Unexpected 
exception occurred. portId=401a3083-015c-1000-695b-fa269fc7432f
2017-05-25 16:28:47,349 ERROR [NiFi Web Server-646] 
o.a.nifi.web.api.ApplicationResource Exception detail:
org.apache.nifi.processor.exception.ProcessException: 
org.apache.nifi.processor.exception.FlowFileAccessException: Failed to import 
data from org.apache.nifi.stream.io.MinimumLengthInputStream@1a1c7b29 for 
StandardFlowFileRecord[uuid=aebf1e86-bb15-445e-b4fa-4c46e752a761,claim=,offset=0,name=134240548094011,size=0]
 due to org.apache.nifi.processor.exception.FlowFileAccessException: Unable to 
create ContentClaim due to org.eclipse.jetty.io.EofException: Early EOF
at 
org.apache.nifi.remote.StandardRootGroupPort.receiveFlowFiles(StandardRootGroupPort.java:543)
 ~[nifi-site-to-site-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at 
org.apache.nifi.web.api.DataTransferResource.receiveFlowFiles(DataTransferResource.java:277)
 ~[classes/:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[na:1.8.0_101]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[na:1.8.0_101]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.8.0_101]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_101]
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
 [jersey-servlet-1.19.jar:1.19]
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
 [jersey-servlet-1.19.jar:1.19]
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
 [jersey-servlet-1.19.jar:1.19]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) 
[javax.servlet-api-3.1.0.jar:3.1.0]
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845) 
[jetty-servlet-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
 [jetty-servlet-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.apache.nifi.web.filter.RequestLogger.doFilter(RequestLogger.java:66) 
[classes/:na]
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
 [jetty-servlet-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:316)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:126)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 

Re: OverlappingFileLockException while restarting

2017-05-25 Thread Joe Witt
most likely there is another instance of nifi still running.  Check ps
-ef | grep nifi.  Kill that and try again.

Thanks

On Thu, May 25, 2017 at 12:10 PM, Karthik Kothareddy (karthikk) [CONT
- Type 2]  wrote:
> Hello,
>
> I am running 1.2.0 Snapshot on a Linux instance with some custom processors. 
> I was trying to update a certificate this morning and restart the instance 
> and I'm stuck with the "OverlappingFileLockException". The instance would not 
> start and soon after I start it ,it will shutdown with the following logs in 
> bootstrap.log
>
> 2017-05-25 15:37:12,935 ERROR [NiFi logging handler] org.apache.nifi.StdErr 
> Failed to start web server: Unable to start Flow Controller.
> 2017-05-25 15:37:12,936 ERROR [NiFi logging handler] org.apache.nifi.StdErr 
> Shutting down...
> 2017-05-25 15:37:13,884 INFO [main] org.apache.nifi.bootstrap.RunNiFi NiFi 
> never started. Will not restart NiFi
>
> However, I was going through the app-log to make sure everything is okay and 
> I found the below trace
>
>
> WARN [Thread-1] org.apache.nifi.web.server.JettyServer Failed to stop web 
> server
> org.springframework.beans.factory.BeanCreationException: Error creating bean 
> with name 'flowService': FactoryBean threw exception on object creation; 
> nested exception is org.springframework.beans.factory.BeanCreationException: 
> Error creating bean with name 'flowController': FactoryBean threw exception 
> on object creation; nested exception is java.lang.RuntimeException: 
> java.nio.channels.OverlappingFileLockException
> at 
> org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:175)
>  ~[na:na]
> at 
> org.springframework.beans.factory.support.FactoryBeanRegistrySupport.getObjectFromFactoryBean(FactoryBeanRegistrySupport.java:103)
>  ~[na:na]
> at 
> org.springframework.beans.factory.support.AbstractBeanFactory.getObjectForBeanInstance(AbstractBeanFactory.java:1585)
>  ~[na:na]
> at 
> org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:254)
>  ~[na:na]
> at 
> org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202)
>  ~[na:na]
> at 
> org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1060)
>  ~[na:na]
> at 
> org.apache.nifi.web.contextlistener.ApplicationStartupContextListener.contextDestroyed(ApplicationStartupContextListener.java:103)
>  ~[na:na]
> at 
> org.eclipse.jetty.server.handler.ContextHandler.callContextDestroyed(ContextHandler.java:845)
>  ~[na:na]
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.callContextDestroyed(ServletContextHandler.java:546)
>  ~[na:na]
> at 
> org.eclipse.jetty.server.handler.ContextHandler.stopContext(ContextHandler.java:826)
>  ~[na:na]
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.stopContext(ServletContextHandler.java:356)
>  ~[na:na]
> at 
> org.eclipse.jetty.webapp.WebAppContext.stopWebapp(WebAppContext.java:1410) 
> ~[na:na]
> at 
> org.eclipse.jetty.webapp.WebAppContext.stopContext(WebAppContext.java:1374) 
> ~[na:na]
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doStop(ContextHandler.java:874)
>  ~[na:na]
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.doStop(ServletContextHandler.java:272)
>  ~[na:na]
> at 
> org.eclipse.jetty.webapp.WebAppContext.doStop(WebAppContext.java:544) ~[na:na]
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
>  ~[na:na]
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
>  ~[na:na]
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
>  ~[na:na]
> at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStop(AbstractHandler.java:73)
>  ~[na:na]
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
>  ~[na:na]
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
>  ~[na:na]
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
>  ~[na:na]
> at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStop(AbstractHandler.java:73)
>  ~[na:na]
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
>  ~[na:na]
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
>  ~[na:na]
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
>  ~[na:na]
> at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStop(AbstractHandler.java:73)
>  ~[na:na]
> at 

OverlappingFileLockException while restarting

2017-05-25 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Hello,

I am running 1.2.0 Snapshot on a Linux instance with some custom processors. I 
was trying to update a certificate this morning and restart the instance and 
I'm stuck with the "OverlappingFileLockException". The instance would not start 
and soon after I start it ,it will shutdown with the following logs in 
bootstrap.log

2017-05-25 15:37:12,935 ERROR [NiFi logging handler] org.apache.nifi.StdErr 
Failed to start web server: Unable to start Flow Controller.
2017-05-25 15:37:12,936 ERROR [NiFi logging handler] org.apache.nifi.StdErr 
Shutting down...
2017-05-25 15:37:13,884 INFO [main] org.apache.nifi.bootstrap.RunNiFi NiFi 
never started. Will not restart NiFi

However, I was going through the app-log to make sure everything is okay and I 
found the below trace


WARN [Thread-1] org.apache.nifi.web.server.JettyServer Failed to stop web server
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'flowService': FactoryBean threw exception on object creation; nested 
exception is org.springframework.beans.factory.BeanCreationException: Error 
creating bean with name 'flowController': FactoryBean threw exception on object 
creation; nested exception is java.lang.RuntimeException: 
java.nio.channels.OverlappingFileLockException
at 
org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:175)
 ~[na:na]
at 
org.springframework.beans.factory.support.FactoryBeanRegistrySupport.getObjectFromFactoryBean(FactoryBeanRegistrySupport.java:103)
 ~[na:na]
at 
org.springframework.beans.factory.support.AbstractBeanFactory.getObjectForBeanInstance(AbstractBeanFactory.java:1585)
 ~[na:na]
at 
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:254)
 ~[na:na]
at 
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202)
 ~[na:na]
at 
org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1060)
 ~[na:na]
at 
org.apache.nifi.web.contextlistener.ApplicationStartupContextListener.contextDestroyed(ApplicationStartupContextListener.java:103)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.ContextHandler.callContextDestroyed(ContextHandler.java:845)
 ~[na:na]
at 
org.eclipse.jetty.servlet.ServletContextHandler.callContextDestroyed(ServletContextHandler.java:546)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.ContextHandler.stopContext(ContextHandler.java:826)
 ~[na:na]
at 
org.eclipse.jetty.servlet.ServletContextHandler.stopContext(ServletContextHandler.java:356)
 ~[na:na]
at 
org.eclipse.jetty.webapp.WebAppContext.stopWebapp(WebAppContext.java:1410) 
~[na:na]
at 
org.eclipse.jetty.webapp.WebAppContext.stopContext(WebAppContext.java:1374) 
~[na:na]
at 
org.eclipse.jetty.server.handler.ContextHandler.doStop(ContextHandler.java:874) 
~[na:na]
at 
org.eclipse.jetty.servlet.ServletContextHandler.doStop(ServletContextHandler.java:272)
 ~[na:na]
at 
org.eclipse.jetty.webapp.WebAppContext.doStop(WebAppContext.java:544) ~[na:na]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStop(AbstractHandler.java:73)
 ~[na:na]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStop(AbstractHandler.java:73)
 ~[na:na]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStop(AbstractHandler.java:73)
 ~[na:na]
at org.eclipse.jetty.server.Server.doStop(Server.java:482) ~[na:na]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
 ~[na:na]
at org.apache.nifi.web.server.JettyServer.stop(JettyServer.java:854) 
~[na:na]
at org.apache.nifi.NiFi.shutdownHook(NiFi.java:188) 
[nifi-runtime-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at org.apache.nifi.NiFi$2.run(NiFi.java:89) 
[nifi-runtime-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at 

Re: unstable cluster

2017-05-25 Thread Joe Witt
looked at a secured cluster and the send times are routinely at 100ms
similar to yours.  I think what i was flagging as potentially
interesting is not interesting at all.

On Thu, May 25, 2017 at 11:34 AM, Joe Witt  wrote:
> Ok.  Well as a point of comparison i'm looking at heartbeat logs from
> another cluster and the times are consistently 1-3 millis for the
> send.  Yours above show 100+ms typical with one north of 900ms.  Not
> sure how relevant that is but something i noticed.
>
> On Thu, May 25, 2017 at 11:29 AM, Mark Bean  wrote:
>> ping shows acceptably fast response time between servers, approximately
>> 0.100-0.150 ms
>>
>>
>> On Thu, May 25, 2017 at 11:13 AM, Joe Witt  wrote:
>>
>>> have you evaluated latency across the machines in your cluster?  I ask
>>> because 122ms is pretty long and 917ms is very long.  Are these nodes
>>> across a WAN link?
>>>
>>> On Thu, May 25, 2017 at 11:08 AM, Mark Bean  wrote:
>>> > Update: now all 5 nodes, regardless of ZK server, are indicating
>>> SUSPENDED
>>> > -> RECONNECTED.
>>> >
>>> > On Thu, May 25, 2017 at 10:23 AM, Mark Bean 
>>> wrote:
>>> >
>>> >> I reduced the number of embedded ZooKeeper servers on the 5-Node NiFi
>>> >> Cluster from 5 to 3. This has improved the situation. I do not see any
>>> of
>>> >> the three Nodes which are also ZK servers disconnecting/reconnecting to
>>> the
>>> >> cluster as before. However, the two Nodes which are not running ZK
>>> continue
>>> >> to disconnect and reconnect. The following is taken from one of the
>>> non-ZK
>>> >> Nodes. It's curious that some messages are issued twice from the same
>>> >> thread, but reference a different object
>>> >>
>>> >> nifi-app.log
>>> >> 2017-05-25 13:40:01,628 INFO [main-EventTrhead] o.a.c.f.state.
>>> ConnectionStateManager
>>> >> State change: SUSPENDED
>>> >> 2017-05-25 13:39:45,627 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
>>> ClusterProtocolHeaertbeater
>>> >> Heartbeat create at 2017-05-25 13:39:45,504 and sent to FQDN:PORT at
>>> >> 2017-05-25 13:39:45,627; send took 122 millis
>>> >> 2017-05-25 13:39:50,862 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
>>> ClusterProtocolHeaertbeater
>>> >> Heartbeat create at 2017-05-25 13:39:50,732 and sent to FQDN:PORT at
>>> >> 2017-05-25 13:39:50,862; send took 122 millis
>>> >> 2017-05-25 13:39:56,089 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
>>> ClusterProtocolHeaertbeater
>>> >> Heartbeat create at 2017-05-25 13:39:55,966 and sent to FQDN:PORT at
>>> >> 2017-05-25 13:39:56,089; send took 129 millis
>>> >> 2017-05-25 13:40:01,629 INFO [Curator-ConnectionStateManager-0]
>>> >> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
>>> >> leader.election.CuratorLeaderElectionManager$ElectionListener@68f8b6a2
>>> >> Connection State changed to SUSPENDED
>>> >> 2017-05-25 13:40:01,629 INFO [Curator-ConnectionStateManager-0]
>>> >> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
>>> >> leader.election.CuratorLeaderElectionManager$ElectionListener@663f55cd
>>> >> Connection State changed to SUSPENDED
>>> >> 2017-05-25 13:40:02,412 INFO [main-EventThread] o.a.c.f.state.
>>> ConnectinoStateManager
>>> >> State change: RECONNECTED
>>> >> 2017-05-25 13:40:02,413 INFO [Curator-ConnectionStateManager-0]
>>> >> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
>>> >> leader.election.CuratorLeaderElectionManager$ElectionListener@68f8b6a2
>>> >> Connection State changed to RECONNECTED
>>> >> 2017-05-25 13:40:02,413 INFO [Curator-ConnectionStateManager-0]
>>> >> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
>>> >> leader.election.CuratorLeaderElectionManager$ElectionListener@663f55cd
>>> >> Connection State changed to RECONNECTED
>>> >> 2017-05-25 13:40:02,550 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
>>> ClusterProtocolHeaertbeater
>>> >> Heartbeat create at 2017-05-25 13:40:01,632 and sent to FQDN:PORT at
>>> >> 2017-05-25 13:40:02,550; send took 917 millis
>>> >> 2017-05-25 13:40:07,787 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
>>> ClusterProtocolHeaertbeater
>>> >> Heartbeat create at 2017-05-25 13:40:07,657 and sent to FQDN:PORT at
>>> >> 2017-05-25 13:40:07,787; send took 129 millis
>>> >>
>>> >> I will work on setting up an external ZK next, but would still like some
>>> >> insight to what is being observed with the embedded ZK.
>>> >>
>>> >> Thanks,
>>> >> Mark
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Wed, May 24, 2017 at 3:57 PM, Mark Bean 
>>> wrote:
>>> >>
>>> >>> Yes, we are using the embedded ZK. We will try instantiating and
>>> external
>>> >>> ZK and see if that resolves the problem.
>>> >>>
>>> >>> The load on the system is extremely small. Currently (as Nodes are
>>> >>> disconnecting/reconnecting) all input ports to the flow are turned
>>> off. The
>>> >>> only data in the flow is from a single GenerateFlow generating 

Re: unstable cluster

2017-05-25 Thread Joe Witt
Ok.  Well as a point of comparison i'm looking at heartbeat logs from
another cluster and the times are consistently 1-3 millis for the
send.  Yours above show 100+ms typical with one north of 900ms.  Not
sure how relevant that is but something i noticed.

On Thu, May 25, 2017 at 11:29 AM, Mark Bean  wrote:
> ping shows acceptably fast response time between servers, approximately
> 0.100-0.150 ms
>
>
> On Thu, May 25, 2017 at 11:13 AM, Joe Witt  wrote:
>
>> have you evaluated latency across the machines in your cluster?  I ask
>> because 122ms is pretty long and 917ms is very long.  Are these nodes
>> across a WAN link?
>>
>> On Thu, May 25, 2017 at 11:08 AM, Mark Bean  wrote:
>> > Update: now all 5 nodes, regardless of ZK server, are indicating
>> SUSPENDED
>> > -> RECONNECTED.
>> >
>> > On Thu, May 25, 2017 at 10:23 AM, Mark Bean 
>> wrote:
>> >
>> >> I reduced the number of embedded ZooKeeper servers on the 5-Node NiFi
>> >> Cluster from 5 to 3. This has improved the situation. I do not see any
>> of
>> >> the three Nodes which are also ZK servers disconnecting/reconnecting to
>> the
>> >> cluster as before. However, the two Nodes which are not running ZK
>> continue
>> >> to disconnect and reconnect. The following is taken from one of the
>> non-ZK
>> >> Nodes. It's curious that some messages are issued twice from the same
>> >> thread, but reference a different object
>> >>
>> >> nifi-app.log
>> >> 2017-05-25 13:40:01,628 INFO [main-EventTrhead] o.a.c.f.state.
>> ConnectionStateManager
>> >> State change: SUSPENDED
>> >> 2017-05-25 13:39:45,627 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
>> ClusterProtocolHeaertbeater
>> >> Heartbeat create at 2017-05-25 13:39:45,504 and sent to FQDN:PORT at
>> >> 2017-05-25 13:39:45,627; send took 122 millis
>> >> 2017-05-25 13:39:50,862 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
>> ClusterProtocolHeaertbeater
>> >> Heartbeat create at 2017-05-25 13:39:50,732 and sent to FQDN:PORT at
>> >> 2017-05-25 13:39:50,862; send took 122 millis
>> >> 2017-05-25 13:39:56,089 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
>> ClusterProtocolHeaertbeater
>> >> Heartbeat create at 2017-05-25 13:39:55,966 and sent to FQDN:PORT at
>> >> 2017-05-25 13:39:56,089; send took 129 millis
>> >> 2017-05-25 13:40:01,629 INFO [Curator-ConnectionStateManager-0]
>> >> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
>> >> leader.election.CuratorLeaderElectionManager$ElectionListener@68f8b6a2
>> >> Connection State changed to SUSPENDED
>> >> 2017-05-25 13:40:01,629 INFO [Curator-ConnectionStateManager-0]
>> >> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
>> >> leader.election.CuratorLeaderElectionManager$ElectionListener@663f55cd
>> >> Connection State changed to SUSPENDED
>> >> 2017-05-25 13:40:02,412 INFO [main-EventThread] o.a.c.f.state.
>> ConnectinoStateManager
>> >> State change: RECONNECTED
>> >> 2017-05-25 13:40:02,413 INFO [Curator-ConnectionStateManager-0]
>> >> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
>> >> leader.election.CuratorLeaderElectionManager$ElectionListener@68f8b6a2
>> >> Connection State changed to RECONNECTED
>> >> 2017-05-25 13:40:02,413 INFO [Curator-ConnectionStateManager-0]
>> >> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
>> >> leader.election.CuratorLeaderElectionManager$ElectionListener@663f55cd
>> >> Connection State changed to RECONNECTED
>> >> 2017-05-25 13:40:02,550 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
>> ClusterProtocolHeaertbeater
>> >> Heartbeat create at 2017-05-25 13:40:01,632 and sent to FQDN:PORT at
>> >> 2017-05-25 13:40:02,550; send took 917 millis
>> >> 2017-05-25 13:40:07,787 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
>> ClusterProtocolHeaertbeater
>> >> Heartbeat create at 2017-05-25 13:40:07,657 and sent to FQDN:PORT at
>> >> 2017-05-25 13:40:07,787; send took 129 millis
>> >>
>> >> I will work on setting up an external ZK next, but would still like some
>> >> insight to what is being observed with the embedded ZK.
>> >>
>> >> Thanks,
>> >> Mark
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, May 24, 2017 at 3:57 PM, Mark Bean 
>> wrote:
>> >>
>> >>> Yes, we are using the embedded ZK. We will try instantiating and
>> external
>> >>> ZK and see if that resolves the problem.
>> >>>
>> >>> The load on the system is extremely small. Currently (as Nodes are
>> >>> disconnecting/reconnecting) all input ports to the flow are turned
>> off. The
>> >>> only data in the flow is from a single GenerateFlow generating 5B
>> every 30
>> >>> secs.
>> >>>
>> >>> Also, it is a 5-node cluster with embedded ZK on each node. First, I
>> will
>> >>> try reducing ZK to only 3 nodes. Then, I will try a 3-node external ZK.
>> >>>
>> >>> Thanks,
>> >>> Mark
>> >>>
>> >>> On Wed, May 24, 2017 at 11:49 AM, Joe Witt  wrote:
>> >>>
>>  Are you using 

Re: unstable cluster

2017-05-25 Thread Mark Bean
ping shows acceptably fast response time between servers, approximately
0.100-0.150 ms


On Thu, May 25, 2017 at 11:13 AM, Joe Witt  wrote:

> have you evaluated latency across the machines in your cluster?  I ask
> because 122ms is pretty long and 917ms is very long.  Are these nodes
> across a WAN link?
>
> On Thu, May 25, 2017 at 11:08 AM, Mark Bean  wrote:
> > Update: now all 5 nodes, regardless of ZK server, are indicating
> SUSPENDED
> > -> RECONNECTED.
> >
> > On Thu, May 25, 2017 at 10:23 AM, Mark Bean 
> wrote:
> >
> >> I reduced the number of embedded ZooKeeper servers on the 5-Node NiFi
> >> Cluster from 5 to 3. This has improved the situation. I do not see any
> of
> >> the three Nodes which are also ZK servers disconnecting/reconnecting to
> the
> >> cluster as before. However, the two Nodes which are not running ZK
> continue
> >> to disconnect and reconnect. The following is taken from one of the
> non-ZK
> >> Nodes. It's curious that some messages are issued twice from the same
> >> thread, but reference a different object
> >>
> >> nifi-app.log
> >> 2017-05-25 13:40:01,628 INFO [main-EventTrhead] o.a.c.f.state.
> ConnectionStateManager
> >> State change: SUSPENDED
> >> 2017-05-25 13:39:45,627 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
> ClusterProtocolHeaertbeater
> >> Heartbeat create at 2017-05-25 13:39:45,504 and sent to FQDN:PORT at
> >> 2017-05-25 13:39:45,627; send took 122 millis
> >> 2017-05-25 13:39:50,862 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
> ClusterProtocolHeaertbeater
> >> Heartbeat create at 2017-05-25 13:39:50,732 and sent to FQDN:PORT at
> >> 2017-05-25 13:39:50,862; send took 122 millis
> >> 2017-05-25 13:39:56,089 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
> ClusterProtocolHeaertbeater
> >> Heartbeat create at 2017-05-25 13:39:55,966 and sent to FQDN:PORT at
> >> 2017-05-25 13:39:56,089; send took 129 millis
> >> 2017-05-25 13:40:01,629 INFO [Curator-ConnectionStateManager-0]
> >> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
> >> leader.election.CuratorLeaderElectionManager$ElectionListener@68f8b6a2
> >> Connection State changed to SUSPENDED
> >> 2017-05-25 13:40:01,629 INFO [Curator-ConnectionStateManager-0]
> >> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
> >> leader.election.CuratorLeaderElectionManager$ElectionListener@663f55cd
> >> Connection State changed to SUSPENDED
> >> 2017-05-25 13:40:02,412 INFO [main-EventThread] o.a.c.f.state.
> ConnectinoStateManager
> >> State change: RECONNECTED
> >> 2017-05-25 13:40:02,413 INFO [Curator-ConnectionStateManager-0]
> >> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
> >> leader.election.CuratorLeaderElectionManager$ElectionListener@68f8b6a2
> >> Connection State changed to RECONNECTED
> >> 2017-05-25 13:40:02,413 INFO [Curator-ConnectionStateManager-0]
> >> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
> >> leader.election.CuratorLeaderElectionManager$ElectionListener@663f55cd
> >> Connection State changed to RECONNECTED
> >> 2017-05-25 13:40:02,550 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
> ClusterProtocolHeaertbeater
> >> Heartbeat create at 2017-05-25 13:40:01,632 and sent to FQDN:PORT at
> >> 2017-05-25 13:40:02,550; send took 917 millis
> >> 2017-05-25 13:40:07,787 INFO [Clustering Tasks Thread-1] o.a.n.c.c.
> ClusterProtocolHeaertbeater
> >> Heartbeat create at 2017-05-25 13:40:07,657 and sent to FQDN:PORT at
> >> 2017-05-25 13:40:07,787; send took 129 millis
> >>
> >> I will work on setting up an external ZK next, but would still like some
> >> insight to what is being observed with the embedded ZK.
> >>
> >> Thanks,
> >> Mark
> >>
> >>
> >>
> >>
> >> On Wed, May 24, 2017 at 3:57 PM, Mark Bean 
> wrote:
> >>
> >>> Yes, we are using the embedded ZK. We will try instantiating and
> external
> >>> ZK and see if that resolves the problem.
> >>>
> >>> The load on the system is extremely small. Currently (as Nodes are
> >>> disconnecting/reconnecting) all input ports to the flow are turned
> off. The
> >>> only data in the flow is from a single GenerateFlow generating 5B
> every 30
> >>> secs.
> >>>
> >>> Also, it is a 5-node cluster with embedded ZK on each node. First, I
> will
> >>> try reducing ZK to only 3 nodes. Then, I will try a 3-node external ZK.
> >>>
> >>> Thanks,
> >>> Mark
> >>>
> >>> On Wed, May 24, 2017 at 11:49 AM, Joe Witt  wrote:
> >>>
>  Are you using the embedded Zookeeper?  If yes we recommend using an
>  external zookeeper.
> 
>  What type of load are the systems under when this occurs (cpu,
>  network, memory, disk io)? Under high load the default timeouts for
>  clustering are too aggressive.  You can relax these for higher load
>  clusters and should see good behavior.  Even if the system overall is
>  not under all that high of load if you're seeing 

Re: unstable cluster

2017-05-25 Thread Joe Witt
have you evaluated latency across the machines in your cluster?  I ask
because 122ms is pretty long and 917ms is very long.  Are these nodes
across a WAN link?

On Thu, May 25, 2017 at 11:08 AM, Mark Bean  wrote:
> Update: now all 5 nodes, regardless of ZK server, are indicating SUSPENDED
> -> RECONNECTED.
>
> On Thu, May 25, 2017 at 10:23 AM, Mark Bean  wrote:
>
>> I reduced the number of embedded ZooKeeper servers on the 5-Node NiFi
>> Cluster from 5 to 3. This has improved the situation. I do not see any of
>> the three Nodes which are also ZK servers disconnecting/reconnecting to the
>> cluster as before. However, the two Nodes which are not running ZK continue
>> to disconnect and reconnect. The following is taken from one of the non-ZK
>> Nodes. It's curious that some messages are issued twice from the same
>> thread, but reference a different object
>>
>> nifi-app.log
>> 2017-05-25 13:40:01,628 INFO [main-EventTrhead] 
>> o.a.c.f.state.ConnectionStateManager
>> State change: SUSPENDED
>> 2017-05-25 13:39:45,627 INFO [Clustering Tasks Thread-1] 
>> o.a.n.c.c.ClusterProtocolHeaertbeater
>> Heartbeat create at 2017-05-25 13:39:45,504 and sent to FQDN:PORT at
>> 2017-05-25 13:39:45,627; send took 122 millis
>> 2017-05-25 13:39:50,862 INFO [Clustering Tasks Thread-1] 
>> o.a.n.c.c.ClusterProtocolHeaertbeater
>> Heartbeat create at 2017-05-25 13:39:50,732 and sent to FQDN:PORT at
>> 2017-05-25 13:39:50,862; send took 122 millis
>> 2017-05-25 13:39:56,089 INFO [Clustering Tasks Thread-1] 
>> o.a.n.c.c.ClusterProtocolHeaertbeater
>> Heartbeat create at 2017-05-25 13:39:55,966 and sent to FQDN:PORT at
>> 2017-05-25 13:39:56,089; send took 129 millis
>> 2017-05-25 13:40:01,629 INFO [Curator-ConnectionStateManager-0]
>> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
>> leader.election.CuratorLeaderElectionManager$ElectionListener@68f8b6a2
>> Connection State changed to SUSPENDED
>> 2017-05-25 13:40:01,629 INFO [Curator-ConnectionStateManager-0]
>> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
>> leader.election.CuratorLeaderElectionManager$ElectionListener@663f55cd
>> Connection State changed to SUSPENDED
>> 2017-05-25 13:40:02,412 INFO [main-EventThread] 
>> o.a.c.f.state.ConnectinoStateManager
>> State change: RECONNECTED
>> 2017-05-25 13:40:02,413 INFO [Curator-ConnectionStateManager-0]
>> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
>> leader.election.CuratorLeaderElectionManager$ElectionListener@68f8b6a2
>> Connection State changed to RECONNECTED
>> 2017-05-25 13:40:02,413 INFO [Curator-ConnectionStateManager-0]
>> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
>> leader.election.CuratorLeaderElectionManager$ElectionListener@663f55cd
>> Connection State changed to RECONNECTED
>> 2017-05-25 13:40:02,550 INFO [Clustering Tasks Thread-1] 
>> o.a.n.c.c.ClusterProtocolHeaertbeater
>> Heartbeat create at 2017-05-25 13:40:01,632 and sent to FQDN:PORT at
>> 2017-05-25 13:40:02,550; send took 917 millis
>> 2017-05-25 13:40:07,787 INFO [Clustering Tasks Thread-1] 
>> o.a.n.c.c.ClusterProtocolHeaertbeater
>> Heartbeat create at 2017-05-25 13:40:07,657 and sent to FQDN:PORT at
>> 2017-05-25 13:40:07,787; send took 129 millis
>>
>> I will work on setting up an external ZK next, but would still like some
>> insight to what is being observed with the embedded ZK.
>>
>> Thanks,
>> Mark
>>
>>
>>
>>
>> On Wed, May 24, 2017 at 3:57 PM, Mark Bean  wrote:
>>
>>> Yes, we are using the embedded ZK. We will try instantiating and external
>>> ZK and see if that resolves the problem.
>>>
>>> The load on the system is extremely small. Currently (as Nodes are
>>> disconnecting/reconnecting) all input ports to the flow are turned off. The
>>> only data in the flow is from a single GenerateFlow generating 5B every 30
>>> secs.
>>>
>>> Also, it is a 5-node cluster with embedded ZK on each node. First, I will
>>> try reducing ZK to only 3 nodes. Then, I will try a 3-node external ZK.
>>>
>>> Thanks,
>>> Mark
>>>
>>> On Wed, May 24, 2017 at 11:49 AM, Joe Witt  wrote:
>>>
 Are you using the embedded Zookeeper?  If yes we recommend using an
 external zookeeper.

 What type of load are the systems under when this occurs (cpu,
 network, memory, disk io)? Under high load the default timeouts for
 clustering are too aggressive.  You can relax these for higher load
 clusters and should see good behavior.  Even if the system overall is
 not under all that high of load if you're seeing garbage collection
 pauses that are lengthy and/or frequent it can cause the same high
 load effect as far as the JVM is concerned.

 Thanks
 Joe

 On Wed, May 24, 2017 at 9:11 AM, Mark Bean 
 wrote:
 > We have a cluster which is showing signs of instability. The Primary
 Node
 > and 

Re: unstable cluster

2017-05-25 Thread Mark Bean
Update: now all 5 nodes, regardless of ZK server, are indicating SUSPENDED
-> RECONNECTED.

On Thu, May 25, 2017 at 10:23 AM, Mark Bean  wrote:

> I reduced the number of embedded ZooKeeper servers on the 5-Node NiFi
> Cluster from 5 to 3. This has improved the situation. I do not see any of
> the three Nodes which are also ZK servers disconnecting/reconnecting to the
> cluster as before. However, the two Nodes which are not running ZK continue
> to disconnect and reconnect. The following is taken from one of the non-ZK
> Nodes. It's curious that some messages are issued twice from the same
> thread, but reference a different object
>
> nifi-app.log
> 2017-05-25 13:40:01,628 INFO [main-EventTrhead] 
> o.a.c.f.state.ConnectionStateManager
> State change: SUSPENDED
> 2017-05-25 13:39:45,627 INFO [Clustering Tasks Thread-1] 
> o.a.n.c.c.ClusterProtocolHeaertbeater
> Heartbeat create at 2017-05-25 13:39:45,504 and sent to FQDN:PORT at
> 2017-05-25 13:39:45,627; send took 122 millis
> 2017-05-25 13:39:50,862 INFO [Clustering Tasks Thread-1] 
> o.a.n.c.c.ClusterProtocolHeaertbeater
> Heartbeat create at 2017-05-25 13:39:50,732 and sent to FQDN:PORT at
> 2017-05-25 13:39:50,862; send took 122 millis
> 2017-05-25 13:39:56,089 INFO [Clustering Tasks Thread-1] 
> o.a.n.c.c.ClusterProtocolHeaertbeater
> Heartbeat create at 2017-05-25 13:39:55,966 and sent to FQDN:PORT at
> 2017-05-25 13:39:56,089; send took 129 millis
> 2017-05-25 13:40:01,629 INFO [Curator-ConnectionStateManager-0]
> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
> leader.election.CuratorLeaderElectionManager$ElectionListener@68f8b6a2
> Connection State changed to SUSPENDED
> 2017-05-25 13:40:01,629 INFO [Curator-ConnectionStateManager-0]
> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
> leader.election.CuratorLeaderElectionManager$ElectionListener@663f55cd
> Connection State changed to SUSPENDED
> 2017-05-25 13:40:02,412 INFO [main-EventThread] 
> o.a.c.f.state.ConnectinoStateManager
> State change: RECONNECTED
> 2017-05-25 13:40:02,413 INFO [Curator-ConnectionStateManager-0]
> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
> leader.election.CuratorLeaderElectionManager$ElectionListener@68f8b6a2
> Connection State changed to RECONNECTED
> 2017-05-25 13:40:02,413 INFO [Curator-ConnectionStateManager-0]
> o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.
> leader.election.CuratorLeaderElectionManager$ElectionListener@663f55cd
> Connection State changed to RECONNECTED
> 2017-05-25 13:40:02,550 INFO [Clustering Tasks Thread-1] 
> o.a.n.c.c.ClusterProtocolHeaertbeater
> Heartbeat create at 2017-05-25 13:40:01,632 and sent to FQDN:PORT at
> 2017-05-25 13:40:02,550; send took 917 millis
> 2017-05-25 13:40:07,787 INFO [Clustering Tasks Thread-1] 
> o.a.n.c.c.ClusterProtocolHeaertbeater
> Heartbeat create at 2017-05-25 13:40:07,657 and sent to FQDN:PORT at
> 2017-05-25 13:40:07,787; send took 129 millis
>
> I will work on setting up an external ZK next, but would still like some
> insight to what is being observed with the embedded ZK.
>
> Thanks,
> Mark
>
>
>
>
> On Wed, May 24, 2017 at 3:57 PM, Mark Bean  wrote:
>
>> Yes, we are using the embedded ZK. We will try instantiating and external
>> ZK and see if that resolves the problem.
>>
>> The load on the system is extremely small. Currently (as Nodes are
>> disconnecting/reconnecting) all input ports to the flow are turned off. The
>> only data in the flow is from a single GenerateFlow generating 5B every 30
>> secs.
>>
>> Also, it is a 5-node cluster with embedded ZK on each node. First, I will
>> try reducing ZK to only 3 nodes. Then, I will try a 3-node external ZK.
>>
>> Thanks,
>> Mark
>>
>> On Wed, May 24, 2017 at 11:49 AM, Joe Witt  wrote:
>>
>>> Are you using the embedded Zookeeper?  If yes we recommend using an
>>> external zookeeper.
>>>
>>> What type of load are the systems under when this occurs (cpu,
>>> network, memory, disk io)? Under high load the default timeouts for
>>> clustering are too aggressive.  You can relax these for higher load
>>> clusters and should see good behavior.  Even if the system overall is
>>> not under all that high of load if you're seeing garbage collection
>>> pauses that are lengthy and/or frequent it can cause the same high
>>> load effect as far as the JVM is concerned.
>>>
>>> Thanks
>>> Joe
>>>
>>> On Wed, May 24, 2017 at 9:11 AM, Mark Bean 
>>> wrote:
>>> > We have a cluster which is showing signs of instability. The Primary
>>> Node
>>> > and Coordinator are reassigned to different nodes every several
>>> minutes. I
>>> > believe this is due to lack of heartbeat or other coordination. The
>>> > following error occurs periodically in the nifi-app.log
>>> >
>>> > ERROR [CommitProcessor:1] o.apache.zookeeper.server.NIOServerCnxn
>>> > Unexpected Exception:
>>> > 

Re: unstable cluster

2017-05-25 Thread Mark Bean
I reduced the number of embedded ZooKeeper servers on the 5-Node NiFi
Cluster from 5 to 3. This has improved the situation. I do not see any of
the three Nodes which are also ZK servers disconnecting/reconnecting to the
cluster as before. However, the two Nodes which are not running ZK continue
to disconnect and reconnect. The following is taken from one of the non-ZK
Nodes. It's curious that some messages are issued twice from the same
thread, but reference a different object

nifi-app.log
2017-05-25 13:40:01,628 INFO [main-EventTrhead]
o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
2017-05-25 13:39:45,627 INFO [Clustering Tasks Thread-1]
o.a.n.c.c.ClusterProtocolHeaertbeater Heartbeat create at 2017-05-25
13:39:45,504 and sent to FQDN:PORT at 2017-05-25 13:39:45,627; send took
122 millis
2017-05-25 13:39:50,862 INFO [Clustering Tasks Thread-1]
o.a.n.c.c.ClusterProtocolHeaertbeater Heartbeat create at 2017-05-25
13:39:50,732 and sent to FQDN:PORT at 2017-05-25 13:39:50,862; send took
122 millis
2017-05-25 13:39:56,089 INFO [Clustering Tasks Thread-1]
o.a.n.c.c.ClusterProtocolHeaertbeater Heartbeat create at 2017-05-25
13:39:55,966 and sent to FQDN:PORT at 2017-05-25 13:39:56,089; send took
129 millis
2017-05-25 13:40:01,629 INFO [Curator-ConnectionStateManager-0]
o.a.n.c.l.e.CuratorLeaderElectionManager
org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@68f8b6a2
Connection State changed to SUSPENDED
2017-05-25 13:40:01,629 INFO [Curator-ConnectionStateManager-0]
o.a.n.c.l.e.CuratorLeaderElectionManager
org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@663f55cd
Connection State changed to SUSPENDED
2017-05-25 13:40:02,412 INFO [main-EventThread]
o.a.c.f.state.ConnectinoStateManager State change: RECONNECTED
2017-05-25 13:40:02,413 INFO [Curator-ConnectionStateManager-0]
o.a.n.c.l.e.CuratorLeaderElectionManager
org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@68f8b6a2
Connection State changed to RECONNECTED
2017-05-25 13:40:02,413 INFO [Curator-ConnectionStateManager-0]
o.a.n.c.l.e.CuratorLeaderElectionManager
org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@663f55cd
Connection State changed to RECONNECTED
2017-05-25 13:40:02,550 INFO [Clustering Tasks Thread-1]
o.a.n.c.c.ClusterProtocolHeaertbeater Heartbeat create at 2017-05-25
13:40:01,632 and sent to FQDN:PORT at 2017-05-25 13:40:02,550; send took
917 millis
2017-05-25 13:40:07,787 INFO [Clustering Tasks Thread-1]
o.a.n.c.c.ClusterProtocolHeaertbeater Heartbeat create at 2017-05-25
13:40:07,657 and sent to FQDN:PORT at 2017-05-25 13:40:07,787; send took
129 millis

I will work on setting up an external ZK next, but would still like some
insight to what is being observed with the embedded ZK.

Thanks,
Mark




On Wed, May 24, 2017 at 3:57 PM, Mark Bean  wrote:

> Yes, we are using the embedded ZK. We will try instantiating and external
> ZK and see if that resolves the problem.
>
> The load on the system is extremely small. Currently (as Nodes are
> disconnecting/reconnecting) all input ports to the flow are turned off. The
> only data in the flow is from a single GenerateFlow generating 5B every 30
> secs.
>
> Also, it is a 5-node cluster with embedded ZK on each node. First, I will
> try reducing ZK to only 3 nodes. Then, I will try a 3-node external ZK.
>
> Thanks,
> Mark
>
> On Wed, May 24, 2017 at 11:49 AM, Joe Witt  wrote:
>
>> Are you using the embedded Zookeeper?  If yes we recommend using an
>> external zookeeper.
>>
>> What type of load are the systems under when this occurs (cpu,
>> network, memory, disk io)? Under high load the default timeouts for
>> clustering are too aggressive.  You can relax these for higher load
>> clusters and should see good behavior.  Even if the system overall is
>> not under all that high of load if you're seeing garbage collection
>> pauses that are lengthy and/or frequent it can cause the same high
>> load effect as far as the JVM is concerned.
>>
>> Thanks
>> Joe
>>
>> On Wed, May 24, 2017 at 9:11 AM, Mark Bean  wrote:
>> > We have a cluster which is showing signs of instability. The Primary
>> Node
>> > and Coordinator are reassigned to different nodes every several
>> minutes. I
>> > believe this is due to lack of heartbeat or other coordination. The
>> > following error occurs periodically in the nifi-app.log
>> >
>> > ERROR [CommitProcessor:1] o.apache.zookeeper.server.NIOServerCnxn
>> > Unexpected Exception:
>> > java.nio.channels.CancelledKeyException: null
>> > at sun.nio.ch.SelectionKeyImpl.ensureValid(SectionKeyImpl.java:
>> 73)
>> > at sun.nio.ch.SelectionKeyImpl.interestOps(SelctionKeyImpl.java
>> :77)
>> > at
>> > org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServ
>> erCnxn.java:151)
>> > at
>> > 

Re: Not able to see the uploaded file in S3

2017-05-25 Thread Andrew Grande
Looks like you have checked the auto-terminate relationship and the upload
fails silently. Make sure you expose the failure relationship from the
PutS3 by sending it e.g. to a funnel.

On Thu, May 25, 2017, 7:00 AM suman@cuddle.ai 
wrote:

> Hi All,
>
> I have a simple flow consists of following processors.
>
> GetFile-->PutS3Object
>
> The flow is successful but not able to see the file in S3.
> When I checked NiFi DataProvenance of PutS3Object in details section i can
> see the below message.
>
> Details
> Auto-Terminated by failure Relationship
>
> in PutS3Object i have given bucket name and used
> AWSCredentialsProviderController Service and the user is having proper
> rights for the S3 Bucket.
>
>
> Please let me know if i am doing anything wrong.
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Not-able-to-see-the-uploaded-file-in-S3-tp15978.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>


Not able to see the uploaded file in S3

2017-05-25 Thread suman....@cuddle.ai
Hi All,

I have a simple flow consists of following processors.

GetFile-->PutS3Object

The flow is successful but not able to see the file in S3.
When I checked NiFi DataProvenance of PutS3Object in details section i can
see the below message.

Details
Auto-Terminated by failure Relationship

in PutS3Object i have given bucket name and used
AWSCredentialsProviderController Service and the user is having proper
rights for the S3 Bucket.


Please let me know if i am doing anything wrong.



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Not-able-to-see-the-uploaded-file-in-S3-tp15978.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.