Hi Joe/Koji, I cant seem to figure out a way to reduce the back pressure or to find the root cause of the errors
1.Unable to communicate with remote instance Peer [xxxx] due to java.io.EOFException; closing connection 2.indicates that port 37e64bd0-5326-3c3f-80f4-42a828dea1d5's destination is full; penalizing peer I have tried increasing the rate of delivery of the data by increasing the concurrent tasks, increasing the back pressure thresholds , replacing the puthbasejson processor with puthbaserecord(the slowest part of our data flow) etc. While i have seen some improvement , I can't seem to get rid of the above errors. I also changed various settings in the Nifi config like nifi.cluster.node.protocol.threads =50 JVM =4096 nifi.cluster.node.max.concurrent.requests=400 nifi.cluster.node.protocol.threads=50 nifi.web.jetty.threads=400 Would it be safe to ignore these error as they fill up the API logs or do I need to investigate further? If we can ignore these then is there any way to stop them from appearing in the log file? On Fri, Jul 13, 2018 at 10:42 AM Joe Witt <joe.w...@gmail.com> wrote: > you can allow for larger backlogs by increasing the backpressure > thresholds OR you can add additional nodes OR you can expire data. > > The whole point of the backpressure and pressure release features are to > let you be in control of how many resources are dedicated to buffering > data. However, in the most basic sense if rate of data arrival always > exceeds rate of delivery then delivery must he made faster or data must be > expired at some threshold age. > > thanks > > On Thu, Jul 12, 2018, 9:34 PM Faisal Durrani <te04.0...@gmail.com> wrote: > >> Hi Koji, >> >> I moved onto another cluster of Nifi nodes , did the same configuration >> for S2S there and boom.. the same error message all over the logs.(nothing >> on the bulletin board) >> >> Could it be because of the back pressure as i also get the error >> -(indicates that port 8c77c1b0-0164-1000-0000-0000052fa54c's destination is >> full; penalizing peer) at the same time i see the closing connection error. >> I don't see a way to resolve the back pressure as we get continue stream of >> data from the kafka which is then inserted into Hbase( the slowest part of >> the data flow) which eventually causes the back pressure. >> >> >> >> >> >> On Fri, Jul 6, 2018 at 4:55 PM Koji Kawamura <ijokaruma...@gmail.com> >> wrote: >> >>> Hi Faisai, >>> >>> I think both error messages indicating the same thing, that is network >>> communication is closed in the middle of a Site-to-Site transaction. >>> That can be happen due to many reasons, such as freaky network, or >>> manually stop the port or RPG while some transaction is being >>> processed. I don't think it is a configuration issue, because NiFi was >>> able to initiate S2S communication. >>> >>> Thanks, >>> Koji >>> >>> On Fri, Jul 6, 2018 at 4:16 PM, Faisal Durrani <te04.0...@gmail.com> >>> wrote: >>> > Hi Koji, >>> > >>> > In the subsequent tests the above error did not come but now we are >>> getting >>> > errors on the RPG : >>> > >>> > RemoteGroupPort[name=1_pk_ip,targets= >>> http://xxxxxx.prod.xx.local:9090/nifi/] >>> > failed to communicate with remote NiFi instance due to >>> java.io.IOException: >>> > Failed to confirm transaction with >>> > Peer[url=nifi://xxx-xxxxx.prod.xx.local:5001] due to >>> java.io.IOException: >>> > Connection reset by peer >>> > >>> > The transport protocol is RAW while the URLs mentioned while setting >>> up the >>> > RPG is one of the node of the (4)node cluster. >>> > >>> > nifi.remote.input.socket.port = 5001 >>> > >>> > nifi.remote.input.secure=false >>> > >>> > nifi.remote.input.http.transaction.ttl=60 sec >>> > >>> > nifi.remote.input.host= >>> > >>> > Please let me know if there is any configuration changes that we need >>> to >>> > make. >>> > >>> > >>> > >>> > >>> > On Fri, Jul 6, 2018 at 9:48 AM Faisal Durrani <te04.0...@gmail.com> >>> wrote: >>> >> >>> >> Hi Koji , >>> >> >>> >> Thank you for your reply. I updated the logback.xml and ran the test >>> >> again. I can see an additional error in the app.log which is as below. >>> >> >>> >> o.a.nifi.remote.SocketRemoteSiteListener >>> >> java.io.EOFException: null >>> >> at >>> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) >>> >> at java.io.DataInputStream.readUTF(DataInputStream.java:589) >>> >> at java.io.DataInputStream.readUTF(DataInputStream.java:564) >>> >> at >>> >> >>> org.apache.nifi.remote.protocol.RequestType.readRequestType(RequestType.java:36) >>> >> at >>> >> >>> org.apache.nifi.remote.protocol.socket.SocketFlowFileServerProtocol.getRequestType(SocketFlowFileServerProtocol.java:147) >>> >> at >>> >> >>> org.apache.nifi.remote.SocketRemoteSiteListener$1$1.run(SocketRemoteSiteListener.java:253) >>> >> at java.lang.Thread.run(Thread.java:745) >>> >> >>> >> >>> >> I notice this error is reported against not just one node but >>> different >>> >> nodes in the cluster. Would you be able infer the root cause of the >>> issue >>> >> from this information? >>> >> >>> >> Thanks. >>> >> >>> >> On Thu, Jul 5, 2018 at 3:34 PM Koji Kawamura <ijokaruma...@gmail.com> >>> >> wrote: >>> >>> >>> >>> Hello, >>> >>> >>> >>> 1. The error message sounds like the client disconnects in the middle >>> >>> of Site-to-Site communication. Enabling debug log would show more >>> >>> information, by adding <logger name="org.apache.nifi.remote" >>> >>> level="DEBUG"/> at conf/logback.xml. >>> >>> >>> >>> 2. I'd suggest checking if your 4 nodes receive data evenly (well >>> >>> distributed). Connection status history, 'Queued Count' per node may >>> >>> be useful to check. If not evenly distributed, I'd lower Remote Port >>> >>> batch settings at sending side. >>> >>> Then try to find a bottle neck in downstream flow. Increasing >>> >>> concurrent tasks at such bottle neck processor can help increasing >>> >>> throughput in some cases. Adding more node will also help. >>> >>> >>> >>> Thanks, >>> >>> Koji >>> >>> >>> >>> On Thu, Jul 5, 2018 at 11:12 AM, Faisal Durrani <te04.0...@gmail.com >>> > >>> >>> wrote: >>> >>> > Hi, I've got two questions >>> >>> > >>> >>> > 1.We are using Remote Process Group with Raw transport protocol to >>> >>> > distribute the data across four node cluster. I see the nifi app >>> log >>> >>> > has a >>> >>> > lot of instance of the below error >>> >>> > >>> >>> > o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with >>> >>> > remote >>> >>> > instance Peer[url=nifi://xxx-xxxxxx.prod.xx.:59528] >>> >>> > >>> >>> > >>> (SocketFlowFileServerProtocol[CommsID=0bf887ed-acb3-4eea-94ac-5abf53ad0bf1]) >>> >>> > due to java.io.EOFException; closing connection >>> >>> > >>> >>> > These error do not show on the bulletin board and nor do I see any >>> data >>> >>> > loss. I was curious to know if there is some bad configuration >>> that is >>> >>> > causing this to happen. >>> >>> > >>> >>> > 2. The app log also has the below error >>> >>> > >>> >>> > o.a.n.r.c.socket.EndpointConnectionPool >>> EndpointConnectionPool[Cluster >>> >>> > URL=[http://xxx-xxxxxx.prod.xx.local:9090/nifi-api]] >>> >>> > Peer[url=nifi://ins-btrananifi107z.prod.jp.local:5001] indicates >>> that >>> >>> > port >>> >>> > 417e3d23-5b1a-1616-9728-9d9d1a462646's destination is full; >>> penalizing >>> >>> > peer >>> >>> > >>> >>> > The data flow consume a high volume data and there is back >>> pressure on >>> >>> > almost all the connections. So probably that is what causing it. I >>> >>> > guess >>> >>> > there isn't much we can do here and once the back pressure resolve >>> ,the >>> >>> > error goes away on its own.Please let me know of your view. >>> >>> > >>> >>> > >>> >>