Hi Josef, The FetchSFTP Processor implements some connection reuse, but GetSFTP, ListSFTP, and PutSFTP create a new connection for every invocation of the processor. I have considered a new approach SFTP components using a Controller Service with connection pooling, but it still requires some design work prior to implementation.
Based on the current SFTP processor behavior, it is possible to have connection timeouts when the SFTP server is not accepting new connections. Every SFTP server is different, but OpenSSH uses the MaxSessions setting in sshd_config [1] to limit the number of simultaneous open sessions. Using the example of 50 PutSFTP processors connecting to the same server, it is very possible to encounter a connection timeout if NiFi triggers all of them in rapid succession. The Connection Timeout property in PutSFTP controls how long the processor will wait before throwing the ClientConnectException. The number of Concurrent Tasks for each PutSFTP processor also influences the number of simultaneous connections. Increasing the Connection Timeout in PutSFTP may hide the problem, but if the destination SFTP server can handle the load, it may be helpful to increase the number of maximum sessions. On the other hand, is there a reason for having 50 separate instances of PutSFTP sending to the same server? It should be possible to design the flow and parameterize the destination using FlowFile attributes. Depending on the number of CPU cores and available threads, having more SFTP connections results in poor performance, so smaller numbers can be better. Regards, David Handermann [1] https://linux.die.net/man/5/sshd_config On Thu, Feb 16, 2023 at 1:33 AM <[email protected]> wrote: > Hi guys > > > > It was upgrade time again on our side, we just upgraded from 1.19.1 to > 1.20.0. Since 1.20.0 we do see significantly more SSH Connection Timeout > errors on the PutSFTP processor… > > > > PutSFTP Processor ERROR: > > 2023-02-16 07:44:07,905 ERROR [Timer-Driven Process Thread-50] > o.a.nifi.processors.standard.PutSFTP PutSFTP[ > id=12563431-c40a-1af7-b09b-16de27d887b7] Unable to transfer > StandardFlowFileRecord[uuid=a1adadb1-61f9-414f-99e5-aad4331165ef, > claim=StandardContentClaim [resourceClaim=StandardResourceClaim[ > id=1676529847583-196507, container=default, section=923], offset=0, > length=6582249],offset=0,name=myfile.avro.gz,size=6582249] to remote host > nas.local.com due to > org.apache.nifi.processors.standard.socket.ClientConnectException: SSH > Client connection failed [nas.local.com:22]: > net.schmizz.sshj.transport.TransportException: Timeout expired: 30000 > MILLISECONDS; routing to failure > net.schmizz.sshj.transport.TransportException: Timeout expired: 30000 > MILLISECONDS > > > > I know that David Handermann implemented a fix ( > https://issues.apache.org/jira/browse/NIFI-9989) for SSHJ, but I don’t > know if it really is related. May be it’s just a configuration issue > (number of allowed concurrent connections?) on the SFTP Server side. > > Let’s make an example, let’s say we do have 50 PutSFTP processors to the > same destination and all of them are getting data in an interval of 15min. > Does NiFi keep the SSH connection established for this 50 processors or > will it be closed after each flow has been transferred? If it isn’t closed > after each flow, how can we influence the timeout? I see only the two > timeouts below, which let me assume that it’s closed after each flow… But > may be one of you guys can bring some light into the dark. > > > > [image: Background pattern Description automatically generated with low > confidence] > > > > > > Cheers > > Josef >
