Hello, When nifi receives this listing from the SFTP server, it will create a FlowFile for each remote file. This FlowFile contains a map of attributes. Additionally, it will create a provenance RECEIVE event. All of this is then stored in an internal data structure in the session object. So, all told you are probably looking at about 1-2 KB of Java heap used for each FlowFile. That means that for 10 million flowfiles you would need something on the order of 10-20 GB of heap space.
Splitting the data directories up into smaller directories would certainly help. But then you would also unfortunately need N number of processors, I believe. If you use a recursive directory structure and configure the processor to recurse, I don’t think you’ll see an improvement. If the ListSFTP processor doesn’t already have a mechanism for batch size (so that you could set it to 100,000) then that would probably be a very useful feature to add. I think we can do this safely, as long as we emit the oldest data first and update the cluster’s state with each batch of FlowFiles. Do you mind creating a JIRA for that improvement? Also, if you are so inclined to delve into implementing the feature, those of us on the mailing list would be more than happy to help you get it across the finish line. Thanks! -Mark Sent from my iPhone On May 5, 2018, at 5:51 PM, B <[email protected]<mailto:[email protected]>> wrote: Hi, I have a directory where some other system was sending me tons of files into an SFTP server. Now ListSFTP works GREAT on every folder everywhere on this server. But this one folder, has I think around 10 million files or something (maybe too many). It takes a long time to run "ls" on this directory like 5-6 minutes for SSH to reply back. It freezes ListSFTP processor on primary node. I end up having to restart the Coordinator node after 30 minutes or hour of waiting. Why does "ls" come back but ListSFTP struggles more and freezes thread? Is there a way to limit the amount of files ListSFTP should pull in? I'd like it if I can divide it up into chunks of 100,000. Maybe I have to run some linux commands to split it apart into 100,000-file folders on the SFTP server? I'm using Nifi 1.3.0 Thanks, [https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png]<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free. www.avg.com<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
