Thanks Gino for confirming that. I've submitted a JIRA and PR. https://issues.apache.org/jira/browse/NIFI-4375
Tried to find something that can improve PutSFTP, but to no avail so far. NIFI-4375 only addresses PutFTP processor. On Sat, Sep 9, 2017 at 7:54 AM, Gino Lisignoli <[email protected]> wrote: > Just built 1.4.0-SNAPSHOT and added in client.setBufferSize(16 * 1024); > This fixed my problem straight away! Hope it makes it into 1.4.0. > > On Sat, Sep 9, 2017 at 12:07 AM, Joe Witt <[email protected]> wrote: >> >> Nope. That would be specific to these using commons net. >> >> Nice work koji and Gino! >> >> >> On Sep 8, 2017 6:54 AM, "Gino Lisignoli" <[email protected]> wrote: >> >> Wow that sounds promising! would that also be the same for any other >> get/put processors? >> >> On Fri, Sep 8, 2017 at 7:47 PM, Koji Kawamura <[email protected]> >> wrote: >>> >>> Hi, >>> >>> Just a quick update. I've tested >>> commons-net-3.3::org.apache.commons.net.ftp.FTPClient without NiFi >>> code. >>> Here is the test code I used. >>> https://gist.github.com/ijokarumawak/f5a329e53901bf2be7c19aa531094abd >>> >>> NiFi doesn't set its BufferSize currently, and default is only 1KB. >>> To send 10MB file >>> >>> # BufferSize = 1KB (default) >>> about 8 sec >>> >>> # BufferSize = 16KB >>> about 300 ms >>> >>> I'm going to create a JIRA to add a processor property to specify buffer >>> size. >>> Also, will test SFTP. >>> Thanks again for highlighting the issue! >>> >>> Koji >>> >>> On Fri, Sep 8, 2017 at 8:48 AM, Koji Kawamura <[email protected]> >>> wrote: >>> > Hi, >>> > >>> > Thanks for clarifying that the number of files is not significant. >>> > I looked at the PutFTP and FTPTransfer source code, and found that it >>> > makes few calls to a FTP server in addition to send a file: >>> > >>> > 1. Sending a file as a temporal file >>> > 2. Update modification time, if 'Last Modified Time' is set >>> > 3. chmod if 'Permissions' is set >>> > 4. Rename the temporal file >>> > >>> > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/util/FTPTransfer.java#L379 >>> > >>> > PutSFTP and SFTPTransfer does followings additionally: >>> > 5. chown if 'Remote Owner' is set >>> > 6. chgrp if 'Remote Group' is set >>> > >>> > I wonder if those additional invocations add more latency. >>> > >>> > Also, it'd be helpful if you can write simple Java code using the >>> > underlying (S)FTP client libraries without NiFi layer to investigate >>> > if NiFi implementation can be improved, or the performance difference >>> > come from library implementation. >>> > >>> > commons-net-3.3::org.apache.commons.net.ftp.FTPClient for FTP >>> > and >>> > jsch-0.1.54::com.jcraft.jsch.ChannelSftp for SFTP >>> > >>> > >>> > I will try to do that at my end when I have time, but it'd be very >>> > helpful if you can do that since you already have testing environment >>> > and base metrics. >>> > >>> > Thanks! >>> > Koji >>> > >>> > >>> > On Thu, Sep 7, 2017 at 6:30 PM, Gino Lisignoli <[email protected]> >>> > wrote: >>> >> Hi >>> >> >>> >> I monitor the send rates using collectd and grafana. It doesn't seem >>> >> to >>> >> matter if I send 10,000 10MB files or 100 1GB files, the maximum >>> >> throughput >>> >> rate of nifi PutFTP and PutSFTP remain the same. 300Mbps and 1Gbs >>> >> >>> >> As mention above, the weird thing is when I send files though ftp and >>> >> sftp >>> >> (without nifi) then the rates are much better. >>> >> >>> >> It's really odd the the rates are significantly slower in NIFI. >>> >> >>> >> On Thu, Sep 7, 2017 at 5:45 PM, Koji Kawamura <[email protected]> >>> >> wrote: >>> >>> >>> >>> Hello Gino, >>> >>> >>> >>> Thanks for sharing your findings on FTP performance. >>> >>> >>> >>> How did you measure send rate from NiFi to your FTP server? >>> >>> >>> >>> Sending multiple FlowFiles would provide less throughput compared to >>> >>> sending one big FlowFile, as PutFTP and PutSFTP make connection to >>> >>> each incoming FlowFile. The overhead of establishing connection each >>> >>> time might be the performance difference you see with mput command. >>> >>> >>> >>> Those processors can decide which FTP servers to use based on >>> >>> incoming >>> >>> FlowFiles' attribute when NiFi Expression Language is used. >>> >>> >>> >>> If that's the case, there are some room for performance improvement >>> >>> by >>> >>> keeping underlying FTP(S) client instance so that it can be reused >>> >>> among multiple onTrigger() call. >>> >>> >>> >>> A possible work-around would be using MergeContent beforehand and >>> >>> send >>> >>> it as a single file, if your use-case allows that. >>> >>> >>> >>> Thanks, >>> >>> Koji >>> >>> >>> >>> On Thu, Sep 7, 2017 at 12:15 PM, Gino Lisignoli >>> >>> <[email protected]> >>> >>> wrote: >>> >>> > I have this weird issue with PutFTP and PutSFTP transfer rates. >>> >>> > >>> >>> > What I am seeing is that no matter what files I transfer from One >>> >>> > server >>> >>> > to >>> >>> > another over a single connection the maximum rates I can send are >>> >>> > 300Mbps >>> >>> > for PutFTP and 1Gbps for PutSFTP. >>> >>> > >>> >>> > The sending nifi is installed on Centos 7, running on a Dell R730, >>> >>> > 190GB >>> >>> > Ram, 16 Cores @ 2.4GHz and 4x10Gb nics bonded. The sending nifi has >>> >>> > it's >>> >>> > content repository on a ramdisk, and the receiving server is >>> >>> > receiving >>> >>> > to a >>> >>> > ramdisk (for testing, to remove disk IO out of the equation). >>> >>> > >>> >>> > When I do a ftp send manually (without nifi) with mput I get ftp >>> >>> > rates >>> >>> > of >>> >>> > ~8Gbs and sftp rates of 2.2Gbs (Which seems slow anyway). >>> >>> > >>> >>> > I would have expected transfer rates similar with nifi. >>> >>> > >>> >>> > Is there any way to work out why these rates are so much slower, >>> >>> > but >>> >>> > also so >>> >>> > consistent? I'm using Nifi-1.30 >>> >> >>> >> >> >> >> >
