Re: RPG S2S Error

Faisal Durrani Thu, 19 Jul 2018 23:12:01 -0700

Hi Joe/Koji,

I cant seem to figure out a way to reduce the back pressure or to find the
root cause of the errors


1.Unable to communicate with remote instance Peer [xxxx] due to
java.io.EOFException; closing connection
2.indicates that port 37e64bd0-5326-3c3f-80f4-42a828dea1d5's destination is
full; penalizing peer

I have tried increasing the rate of delivery of the data by increasing the
concurrent tasks, increasing the back pressure thresholds , replacing the
puthbasejson processor with puthbaserecord(the slowest part of our data
flow) etc. While i have seen some  improvement , I can't seem to get rid of
the above errors. I also changed various settings in the Nifi config like

nifi.cluster.node.protocol.threads =50
JVM =4096
nifi.cluster.node.max.concurrent.requests=400
nifi.cluster.node.protocol.threads=50
nifi.web.jetty.threads=400

Would it be safe to ignore these error as they fill up the API logs or do I
need to investigate further? If we can ignore these then is there any way
to stop them from appearing in the log file?



On Fri, Jul 13, 2018 at 10:42 AM Joe Witt <joe.w...@gmail.com> wrote:

> you can allow for larger backlogs by increasing the backpressure
> thresholds OR you can add additional nodes OR you can expire data.
>
> The whole point of the backpressure and pressure release features are to
> let you be in control of how many resources are dedicated to buffering
> data.  However, in the most basic sense if rate of data arrival always
> exceeds rate of delivery then delivery must he made faster or data must be
> expired at some threshold age.
>
> thanks
>
> On Thu, Jul 12, 2018, 9:34 PM Faisal Durrani <te04.0...@gmail.com> wrote:
>
>> Hi Koji,
>>
>> I moved onto another cluster of Nifi nodes , did the same configuration
>> for S2S there and boom.. the same error message all over the logs.(nothing
>> on the bulletin board)
>>
>> Could it be because of the back pressure as i also get the  error
>> -(indicates that port 8c77c1b0-0164-1000-0000-0000052fa54c's destination is
>> full; penalizing peer) at the same time i see the closing connection error.
>> I don't see a way to resolve the back pressure as we get continue stream of
>> data from the kafka which is then inserted into Hbase( the slowest part of
>> the data flow) which eventually causes the back pressure.
>>
>>
>>
>>
>>
>> On Fri, Jul 6, 2018 at 4:55 PM Koji Kawamura <ijokaruma...@gmail.com>
>> wrote:
>>
>>> Hi Faisai,
>>>
>>> I think both error messages indicating the same thing, that is network
>>> communication is closed in the middle of a Site-to-Site transaction.
>>> That can be happen due to many reasons, such as freaky network, or
>>> manually stop the port or RPG while some transaction is being
>>> processed. I don't think it is a configuration issue, because NiFi was
>>> able to initiate S2S communication.
>>>
>>> Thanks,
>>> Koji
>>>
>>> On Fri, Jul 6, 2018 at 4:16 PM, Faisal Durrani <te04.0...@gmail.com>
>>> wrote:
>>> > Hi Koji,
>>> >
>>> > In the subsequent tests the above error did not come but now we are
>>> getting
>>> > errors on the RPG :
>>> >
>>> > RemoteGroupPort[name=1_pk_ip,targets=
>>> http://xxxxxx.prod.xx.local:9090/nifi/]
>>> > failed to communicate with remote NiFi instance due to
>>> java.io.IOException:
>>> > Failed to confirm transaction with
>>> > Peer[url=nifi://xxx-xxxxx.prod.xx.local:5001] due to
>>> java.io.IOException:
>>> > Connection reset by peer
>>> >
>>> > The transport protocol is RAW while the URLs mentioned while setting
>>> up the
>>> > RPG is one of the node of the (4)node cluster.
>>> >
>>> > nifi.remote.input.socket.port = 5001
>>> >
>>> > nifi.remote.input.secure=false
>>> >
>>> > nifi.remote.input.http.transaction.ttl=60 sec
>>> >
>>> > nifi.remote.input.host=
>>> >
>>> > Please let me  know if there is any configuration changes that we need
>>> to
>>> > make.
>>> >
>>> >
>>> >
>>> >
>>> > On Fri, Jul 6, 2018 at 9:48 AM Faisal Durrani <te04.0...@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi Koji ,
>>> >>
>>> >> Thank you for your reply. I updated the logback.xml and ran the test
>>> >> again. I can see an additional error in the app.log which is as below.
>>> >>
>>> >> o.a.nifi.remote.SocketRemoteSiteListener
>>> >> java.io.EOFException: null
>>> >>      at
>>> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>>> >>      at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>>> >>      at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>>> >>      at
>>> >>
>>> org.apache.nifi.remote.protocol.RequestType.readRequestType(RequestType.java:36)
>>> >>      at
>>> >>
>>> org.apache.nifi.remote.protocol.socket.SocketFlowFileServerProtocol.getRequestType(SocketFlowFileServerProtocol.java:147)
>>> >>      at
>>> >>
>>> org.apache.nifi.remote.SocketRemoteSiteListener$1$1.run(SocketRemoteSiteListener.java:253)
>>> >>      at java.lang.Thread.run(Thread.java:745)
>>> >>
>>> >>
>>> >> I notice this error is reported against not just one node but
>>> different
>>> >> nodes in the cluster. Would you be able infer the root cause of the
>>> issue
>>> >> from this information?
>>> >>
>>> >> Thanks.
>>> >>
>>> >> On Thu, Jul 5, 2018 at 3:34 PM Koji Kawamura <ijokaruma...@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> 1. The error message sounds like the client disconnects in the middle
>>> >>> of Site-to-Site communication. Enabling debug log would show more
>>> >>> information, by adding <logger name="org.apache.nifi.remote"
>>> >>> level="DEBUG"/> at conf/logback.xml.
>>> >>>
>>> >>> 2. I'd suggest checking if your 4 nodes receive data evenly (well
>>> >>> distributed). Connection status history, 'Queued Count' per node may
>>> >>> be useful to check. If not evenly distributed, I'd lower Remote Port
>>> >>> batch settings at sending side.
>>> >>> Then try to find a bottle neck in downstream flow. Increasing
>>> >>> concurrent tasks at such bottle neck processor can help increasing
>>> >>> throughput in some cases. Adding more node will also help.
>>> >>>
>>> >>> Thanks,
>>> >>> Koji
>>> >>>
>>> >>> On Thu, Jul 5, 2018 at 11:12 AM, Faisal Durrani <te04.0...@gmail.com
>>> >
>>> >>> wrote:
>>> >>> > Hi, I've got two questions
>>> >>> >
>>> >>> > 1.We are using Remote Process Group with Raw transport protocol to
>>> >>> > distribute the data across four node cluster. I see the nifi app
>>> log
>>> >>> > has a
>>> >>> > lot of instance of the below error
>>> >>> >
>>> >>> > o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with
>>> >>> > remote
>>> >>> > instance Peer[url=nifi://xxx-xxxxxx.prod.xx.:59528]
>>> >>> >
>>> >>> >
>>> (SocketFlowFileServerProtocol[CommsID=0bf887ed-acb3-4eea-94ac-5abf53ad0bf1])
>>> >>> > due to java.io.EOFException; closing connection
>>> >>> >
>>> >>> > These error do not show on the bulletin board and nor do I see any
>>> data
>>> >>> > loss. I was curious to know if there is some bad configuration
>>> that is
>>> >>> > causing this to happen.
>>> >>> >
>>> >>> > 2. The app log also has the below error
>>> >>> >
>>> >>> > o.a.n.r.c.socket.EndpointConnectionPool
>>> EndpointConnectionPool[Cluster
>>> >>> > URL=[http://xxx-xxxxxx.prod.xx.local:9090/nifi-api]]
>>> >>> > Peer[url=nifi://ins-btrananifi107z.prod.jp.local:5001] indicates
>>> that
>>> >>> > port
>>> >>> > 417e3d23-5b1a-1616-9728-9d9d1a462646's destination is full;
>>> penalizing
>>> >>> > peer
>>> >>> >
>>> >>> > The data flow consume a high volume data and there is back
>>> pressure on
>>> >>> > almost all the connections. So probably that is what causing it. I
>>> >>> > guess
>>> >>> > there isn't much we can do here and once the back pressure resolve
>>> ,the
>>> >>> > error goes away on its own.Please let me know of your view.
>>> >>> >
>>> >>> >
>>>
>>

Re: RPG S2S Error

Reply via email to