Re: FetchSFTP vs GetSFTP

2017-11-01 Thread Bryan Bende
The list-fetch approach sounds correct, and the micro acquisition cluster (if necessary) also sounds like a good idea. Regarding multiple hosts, the connection pooling in FetchSFTP does account for that. Its basically a map from the hostname string to a holder of connections for that hostname. -B

Re: FetchSFTP vs GetSFTP

2017-10-31 Thread Ryan Ward
Yep that's exactly how I have it set up with a push to RPG. Is that preferred? I just started playing with it to be honest. I can see how it could be tricky if you have to pull from multiple servers each flow file could potentially have a different sftp host address in the queues. All together we

Re: FetchSFTP vs GetSFTP

2017-10-31 Thread Bryan Bende
Ryan, The 10 seconds appears to be a hard-code rule in the processor, although it seems like it could be turned into a configurable property. It would require a code change to make it grab a batch of flow files during a single execution. In theory it shouldn't provide that much of a difference, b

Re: FetchSFTP vs GetSFTP

2017-10-31 Thread Ryan Ward
Joe/Bryan Thanks! I believe the one specific file per concurrent task/connection (and too many threads) is the issue I have we have a lot of small files and often times backed up . I'm going to drop the task count to take advantage of the pooling. Is it possible to have Fetch do batches vs a singl

Re: FetchSFTP vs GetSFTP

2017-10-31 Thread Bryan Bende
Ryan, Personally I don't have experience running these processors at scale, but from a code perspective they are fundamentally different... GetSFTP is a source processor, meaning is not being fed by an upstream connection, so when it executes it can create a connection and retrieve up to max-sele

Re: FetchSFTP vs GetSFTP

2017-10-31 Thread Joe Witt
Ryan - dont know the code specifics behind FetchSFTP off-hand but i can confirm there are users at that range for it. Thanks On Tue, Oct 31, 2017 at 11:38 AM, Ryan Ward wrote: > I've found that on a single node getSFTP is able to pull more files off a > remote server than Fetch in a cluster. I n

FetchSFTP vs GetSFTP

2017-10-31 Thread Ryan Ward
I've found that on a single node getSFTP is able to pull more files off a remote server than Fetch in a cluster. I noticed Fetch doesn't have a max selects so it is requiring way more connections (one per file?) and concurrent threads to keep up. Was wondering if anyone is using List/Fetch at scal