Hello,

When nifi receives this listing from the SFTP server, it will create a FlowFile 
for each remote file. This FlowFile contains a map of attributes. Additionally, 
it will create a provenance RECEIVE event. All of this is then stored in an 
internal data structure in the session object. So, all told you are probably 
looking at about 1-2 KB of Java heap used for each FlowFile. That means that 
for 10 million flowfiles you would need something on the order of 10-20 GB of 
heap space.

Splitting the data directories up into smaller directories would certainly 
help. But then you would also unfortunately need N number of processors, I 
believe. If you use a recursive directory structure and configure the processor 
to recurse, I don’t think you’ll see an improvement.

If the ListSFTP processor doesn’t already have a mechanism for batch size (so 
that you could set it to 100,000) then that would probably be a very useful 
feature to add. I think we can do this safely, as long as we emit the oldest 
data first and update the cluster’s state with each batch of FlowFiles.

Do you mind creating a JIRA for that improvement? Also, if you are so inclined 
to delve into implementing the feature, those of us on the mailing list would 
be more than happy to help you get it across the finish line.

Thanks!
-Mark

Sent from my iPhone

On May 5, 2018, at 5:51 PM, B <[email protected]<mailto:[email protected]>> 
wrote:

Hi,
 I have a directory where some other system was sending me tons of files into 
an SFTP server.

Now ListSFTP works GREAT on every folder everywhere on this server. But this 
one folder, has I think around 10 million files or something (maybe too many).

It takes a long time to run "ls" on this directory like 5-6 minutes for SSH to 
reply back.

It freezes ListSFTP processor on primary node. I end up having to restart the 
Coordinator node after 30 minutes or hour of waiting.

Why does "ls" come back but ListSFTP struggles more and freezes thread? Is 
there a way to limit the amount of files ListSFTP should pull in? I'd like it 
if I can divide it up into chunks of 100,000.

Maybe I have to run some linux commands to split it apart into 100,000-file 
folders on the SFTP server?

I'm using Nifi 1.3.0

Thanks,




[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png]<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
     Virus-free. 
www.avg.com<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>

Reply via email to