Rick, Yes please regarding the GetFile recursion finding.
And yes you can certainly have processors that operate at the edges of the flow Get* and Put* which are streaming oriented. It is always about how you demarcate objects within those streams as those objects/things/events/data/whatever are what NiFi operates on. Happy to talk through these things as you progress and if there are features you think make sense for a variety of use cases please do let us know. Thanks Joe On Mon, Aug 17, 2015 at 6:27 PM, Rick Braddy <[email protected]> wrote: > That makes perfect sense, Joe. We figured out that we like the content > repository and queuing. For huge files, we can develop our own processor > that breaks big files into smaller chunks (or streams, if required), so > should not actually be an issue - it's just a limitation of the > out-of-the-box GetFile processor today (by the way, ran into some issues with > recursive subdirectories not being picked up and transferred - should I open > a bug report on that one?) > > Rick > > -----Original Message----- > From: Joe Witt [mailto:[email protected]] > Sent: Sunday, August 16, 2015 8:02 PM > To: [email protected] > Subject: Re: New to NiFi - Remote Process Group failing to connect due to > "magic header" not present > > Rick > > The content repository is subject to the available disk space of the content > repository. Data is not held resident in memory (ram/heap) unless a > processor brings it into memory and the vast majority of them would only ever > have some small buffer size held. Thus you can truly handle objects that are > extremely large but they certainly cannot be larger than the disk space you > have available. > > Streaming performance in nifi would compare favorably with other systems that > are transactional and durable or which delegate to some remotely accessible > messaging bus. It would likely not compare favorably with systems that are > non-transactional and in-memory. > > Thanks > Joe > > On Sun, Aug 16, 2015 at 5:56 PM, Rick Braddy <[email protected]> wrote: >> Yeah. There's a trade off between pure transfer speed of a specialized >> utility vs. the flexibility and power of NiFi. >> >> Also concerned over very large files that won't fit in content >> repository memory. >> >> How does streaming performance compare thru NiFi? >> >> >> >> On Aug 16, 2015, at 12:32 PM, Joe Witt <[email protected]> wrote: >> >> Rick, >> >> "much slower than basic "scp" across nodes, which doesn't incur the >> extra data copying" >> >> That is certainly true. If what you need is precisely what scp does >> then scp is the perfect tool. >> >> Thanks >> Joe >> >> On Sun, Aug 16, 2015 at 1:19 PM, Rick Braddy <[email protected]> wrote: >>> >>> Indeed. I did increase the concurrent tasks which helped for sure. >>> Still much slower than basic "scp" across nodes, which doesn't incur >>> the extra data copying. >>> >>> >>> >>> On Aug 16, 2015, at 11:49 AM, Aldrin Piri <[email protected]> wrote: >>> >>> Rick, >>> >>> Thanks for the logs. I did not see anything particularly out of the >>> ordinary and would be inclined to believe there may have been some >>> network hiccups in the process. >>> >>> NiFi has flowfiles queued on connections until the file is >>> transferred to another relationship. You would be correct in that >>> they are enqueued for the duration of transfer until the successful >>> transmission occurs. To help throughput, and allow multiple files to >>> traverse the network, you can allocate additional concurrent tasks to >>> the input port receiving these files. >>> >>> >>> On Sat, Aug 15, 2015 at 18:20 Rick Braddy <[email protected]> wrote: >>>> >>>> Sure. Attached zip file contains log files on the target node. >>>> >>>> >>>> >>>> I have also observed some occasional Putty disconnects from the >>>> sending side’s terminal connection (Remote Process Group’s VM), >>>> which makes me wonder if there may be a networking issue with it, so >>>> may not be a problem with NiFi at all. >>>> >>>> >>>> >>>> One other question I have. When the network connection stayed up, >>>> it was able to transfer the two 1 GB files and one 10 GB file from >>>> source node to target node; however, it appears these files get >>>> “queued” for a long period of time (showing up in the connection >>>> between the Input Connector processor and the PutFile processor. As >>>> these are not “streamed”, I assume it’s just taking time to copy all that >>>> data around and is to be expected. >>>> >>>> >>>> >>>> Is there a better way to “stream” from GetFile è Remote Process >>>> Group è Input Connetor è PutFile (or something equivalent)? >>>> >>>> >>>> >>>> From: Aldrin Piri [mailto:[email protected]] >>>> Sent: Saturday, August 15, 2015 4:37 PM >>>> >>>> >>>> To: [email protected] >>>> Subject: Re: New to NiFi - Remote Process Group failing to connect >>>> due to "magic header" not present >>>> >>>> >>>> >>>> Rick, >>>> >>>> >>>> >>>> Timeouts certainly aren't an expected behavior. Might you have some >>>> logs from your remote receiver that is receiving the files that we >>>> could take a look at? >>>> >>>> >>>> >>>> It looks like the connection is functional in part, as one item did >>>> at least make the transfer. >>>> >>>> >>>> >>>> Thanks! >>>> >>>> >>>> >>>> On Sat, Aug 15, 2015 at 5:21 PM, Rick Braddy <[email protected]> wrote: >>>> >>>> That appeared at first to resolve the connection problem, as I could >>>> then see and connect to the remote input connector via the Remote >>>> Process Group and my basic file transfer flow worked. >>>> >>>> >>>> >>>> However, now there are timeout warnings – assume this is not normal. >>>> >>>> >>>> >>>> <image001.png> >>>> >>>> >>>> >>>> From: Aldrin Piri [mailto:[email protected]] >>>> Sent: Saturday, August 15, 2015 4:06 PM >>>> >>>> >>>> To: [email protected] >>>> Subject: Re: New to NiFi - Remote Process Group failing to connect >>>> due to "magic header" not present >>>> >>>> >>>> >>>> Rick, >>>> >>>> >>>> >>>> Site to Site works by talking to the port that the NiFi web tier is >>>> running on and not the configured "nifi.remote.input.socket.port" >>>> which is used after the initial handshaking and connection. This is >>>> likely why you are receiving the messages about the error of the >>>> magic header. It is receiving something from that socket, but not the >>>> desired input. >>>> >>>> >>>> >>>> Make a remote processing group that points to port 8080 (assuming >>>> this was left as the default) of your other instance and you should >>>> be good to go. >>>> >>>> >>>> >>>> Please let us know if that is not the case. >>>> >>>> >>>> >>>> On Sat, Aug 15, 2015 at 4:57 PM, Rick Braddy <[email protected]> wrote: >>>> >>>> Hi Aldrin, >>>> >>>> >>>> >>>> Here are the property settings: >>>> >>>> >>>> >>>> # Site to Site properties >>>> >>>> nifi.remote.input.socket.port=8081 >>>> >>>> nifi.remote.input.secure=false >>>> >>>> >>>> >>>> Referencing remote node via http://<IP>:8081/nifi (not the UI >>>> address, but the separate site-to-site listener, which I see via >>>> netstat on target >>>> node) >>>> >>>> >>>> >>>> Rick >>>> >>>> >>>> >>>> From: Aldrin Piri [mailto:[email protected]] >>>> Sent: Saturday, August 15, 2015 3:48 PM >>>> To: [email protected] >>>> Subject: Re: New to NiFi - Remote Process Group failing to connect >>>> due to "magic header" not present >>>> >>>> >>>> >>>> Rick, >>>> >>>> >>>> >>>> Welcome to the community! >>>> >>>> >>>> >>>> We seem to be a little short in the documentation department for how >>>> to make use of Remote Process Groups, but will look to remedy that. >>>> >>>> >>>> >>>> Just to confirm a few settings, both your nodes have a >>>> nifi.remote.input.socket.port set and each has >>>> nifi.remote.input.secure set to false within your nifi.properties. >>>> >>>> >>>> >>>> From here, are you referencing the remote node via its UI address? >>>> Out of the box, this would be <server FQDN/IP>:8080/nifi >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Aldrin >>>> >>>> >>>> >>>> On Sat, Aug 15, 2015 at 4:11 PM, Rick Braddy <[email protected]> wrote: >>>> >>>> Hi, >>>> >>>> >>>> >>>> I’m new to NiFi, trying to get my first Remote Process Group >>>> configured and working between two CentOS nodes. For expediency, I >>>> have configured site-to-site port to 8081 and set secure to false >>>> (to avoid dealing with SSL certificate setup for now – will get to that >>>> later). >>>> >>>> >>>> >>>> Trying to get two nodes to communicate using Remote Process Group. >>>> Google is not finding any useful examples of how to set this up and >>>> get it to work, so kind of fumbling through it today (learning a >>>> lot, but slow going). >>>> >>>> >>>> >>>> I found the nifil-app.log file and why the Remote Process Group is >>>> failing to connect to the second node. Not sure why the “Magic Header” >>>> isn’t right, but connections are being closed and getting Read >>>> Timeouts on the RPG sending node – the receiving NiFi node is >>>> closing the connection because it thinks the sender isn’t a valid >>>> NiFi node due to missing/incorrect magic header. >>>> >>>> >>>> >>>> 2015-08-15 13:28:08,939 ERROR [Site-to-Site Worker Thread-27] >>>> o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with >>>> remote instance null due to >>>> org.apache.nifi.remote.exception.HandshakeException: >>>> Handshake with nifi://SoftNAS-RGB1:57336 failed because the Magic >>>> Header was not present; closing connection >>>> >>>> >>>> >>>> Not sure where to go from here to resolve this. >>>> >>>> >>>> >>>> Rick >>>> >>>> >>>> >>>> >>>> >>>> >> >>
