RE: Generate flowfiles from flowfile content
ExtractText did the job! Thank you very much! :-) > Date: Wed, 23 Sep 2015 16:05:44 -0700 > Subject: Re: Generate flowfiles from flowfile content > From: joe.w...@gmail.com > To: users@nifi.apache.org > > Bryan - you may be right that ExtractText will be the right play once > splitjson is done doing its thing. Perhaps either will work. Maybe > we can show either or. If the schema is fairly well known i'm > thinking extract json would be the winner. > > thanks > Joe > > On Wed, Sep 23, 2015 at 4:04 PM, Bryan Bende <bbe...@gmail.com> wrote: > > Sorry I missed Joe's email while sending mine... I can put together a > > template showing this. > > > > > > On Wednesday, September 23, 2015, Bryan Bende <bbe...@gmail.com> wrote: > >> > >> David, > >> > >> Take a look at ExtractText, it is for pulling FlowFile content into > >> attributes. I think that will do what you are looking for. > >> > >> -Bryan > >> > >> On Wednesday, September 23, 2015, David Klim <davidkl...@hotmail.com> > >> wrote: > >>> > >>> Hello Bryan, > >>> > >>> I should have been more specific. What I am trying to do is to fetch > >>> files from S3. I am using the GetSQS processor to get new object (files) > >>> events, and each event is a json containing the list of new objects > >>> (files) > >>> in the bucket. The output of the GetSQS is processed by SplitJson and I > >>> get > >>> flowfiles containing one object key (filename) each. I need to feed this > >>> into FetchS3Object to retrive the actual file, but FetchS3Object expects > >>> the > >>> flowfile filename attribute (or any other) to be the filename. So I guess > >>> the problem is moving the filename string from the flowfile content to > >>> some > >>> attribute. > >>> > >>> If there is no other alternative, I will implement this processor. > >>> > >>> Thanks! > >>> > >>> > >>> From: rbra...@softnas.com > >>> To: users@nifi.apache.org > >>> Subject: RE: Generate flowfiles from flowfile content > >>> Date: Wed, 23 Sep 2015 19:59:21 + > >>> > >>> Good idea, Adam. > >>> > >>> > >>> > >>> I will post a separate review thread on the dev@ list to track comments. > >>> > >>> > >>> > >>> Here’s the repository link: https://github.com/rickbraddy/nifishare > >>> > >>> > >>> > >>> > >>> > >>> Thanks > >>> > >>> Rick > >>> > >>> > >>> > >>> From: Adam Taft [mailto:a...@adamtaft.com] > >>> Sent: Wednesday, September 23, 2015 1:48 PM > >>> To: users@nifi.apache.org > >>> Subject: Re: Generate flowfiles from flowfile content > >>> > >>> > >>> > >>> Not speaking for the entire community, but I am sure that such a > >>> contribution would (at minimum) be appreciated for review, consideration > >>> and > >>> potential inclusion. The best thing would be ideally hosting the source > >>> code somewhere that the rest of the community could go to for review. > >>> Maybe > >>> you could host the GetFileData and PutFileData processors on a GitHub > >>> repository somewhere? > >>> > >>> I think the idea you proposed is good, but might need to be aligned with > >>> the work (if any) for the referenced ListFile and FetchFile > >>> implementation. > >>> And the differences in your PutFileData vs. PutFile would ideally be well > >>> vetted as well. > >>> > >>> Adam > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Wed, Sep 23, 2015 at 2:23 PM, Rick Braddy <rbra...@softnas.com> wrote: > >>> > >>> We have already developed modified a modified GetFIle called GetFileData > >>> that takes an incoming FlowFile containing the path to the file/directory > >>> that needs to be transferred. There is a corresponding PutFileData on the > >>> other side that accepts the incoming file/directory that creates the > >>> directory/tree as needed or writes the file, then sets the permissions and > >&g
RE: Generate flowfiles from flowfile content
We have already developed modified a modified GetFIle called GetFileData that takes an incoming FlowFile containing the path to the file/directory that needs to be transferred. There is a corresponding PutFileData on the other side that accepts the incoming file/directory that creates the directory/tree as needed or writes the file, then sets the permissions and ownership. GetFileData also receives a file.rootdir attribute that gets passed along to PutFileData, so it can rebase the original file’s location relative to the configured target directory. Unlike GetFile/PutFile, these processor work with entire directory trees and are triggered by incoming FlowFiles to GetFileData. Eventually, we want to further enhance these two processors so they can break large files into “chunks” and send as multi-part files that get reassembled by PutFileData, resolving the limitations associated with huge files and content repository size; e.g., there are default 100MB chunk threshold and 10MB chunk size properties that will control the chunking, if enabled. If the community is interested would benefit from these processors, we’re happy to consider further generalizing and contributing these processors, along with any further refinements based upon community review and feedback. I believe these processors would address both the Jira and David’s original inquiry. Rick From: Adam Taft [mailto:a...@adamtaft.com] Sent: Wednesday, September 23, 2015 1:09 PM To: users@nifi.apache.org Subject: Re: Generate flowfiles from flowfile content Right. This would be the use case that FetchFile [1] would help solve. [1] https://issues.apache.org/jira/browse/NIFI-631 On Wed, Sep 23, 2015 at 1:11 PM, Bryan Bende <bbe...@gmail.com<mailto:bbe...@gmail.com>> wrote: Hi David, When you say "files I need to retrieve", are you referring to files on the local filesystem where NiFi is running? If so, I am not aware of an existing processor that does that. Currently we have GetFile which polls a directory, but that is not what you want here. It would be fairly straight forward to implement with a custom processor though... You would read the incoming FlowFile content to get the filename, then create a new FlowFile with your desired name, and write the content of the local file to the new FlowFile. -Bryan On Wed, Sep 23, 2015 at 11:16 AM, David Klim <davidkl...@hotmail.com<mailto:davidkl...@hotmail.com>> wrote: Hello, In a flow I am defining, I receive a flowfile containing json string. Using the splitJson processor I can extract some json paths pointing to some files I need to retrieve, but the filename is the content of the generated flowfile. So I would need to be able to read the content and generate a flowfile with that name instead. How could I do that? Thanks!
Re: Generate flowfiles from flowfile content
David, I think if i read your case correctly this should be supported really well. The flow would be something like: GetSQS -> SplitJson -> EvaluateJsonPath -> FetchS3Object In SplitJSON you'll break apart the original object into smaller valid JSON objects. In evaluate JsonPath you'll promote the filename/url you need from the JSON content to flow file attributes In FetchS3 you'll go grab the item based on the name/url you pulled in evaluate json path. Bryan: Any chance you could put together a quick template for David to check out? Thanks Joe On Wed, Sep 23, 2015 at 3:41 PM, David Klim <davidkl...@hotmail.com> wrote: > Hello Bryan, > > I should have been more specific. What I am trying to do is to fetch files > from S3. I am using the GetSQS processor to get new object (files) events, > and each event is a json containing the list of new objects (files) in the > bucket. The output of the GetSQS is processed by SplitJson and I get > flowfiles containing one object key (filename) each. I need to feed this > into FetchS3Object to retrive the actual file, but FetchS3Object expects the > flowfile filename attribute (or any other) to be the filename. So I guess > the problem is moving the filename string from the flowfile content to some > attribute. > > If there is no other alternative, I will implement this processor. > > Thanks! > > > From: rbra...@softnas.com > To: users@nifi.apache.org > Subject: RE: Generate flowfiles from flowfile content > Date: Wed, 23 Sep 2015 19:59:21 + > > > Good idea, Adam. > > > > I will post a separate review thread on the dev@ list to track comments. > > > > Here’s the repository link: https://github.com/rickbraddy/nifishare > > > > > > Thanks > > Rick > > > > From: Adam Taft [mailto:a...@adamtaft.com] > Sent: Wednesday, September 23, 2015 1:48 PM > To: users@nifi.apache.org > Subject: Re: Generate flowfiles from flowfile content > > > > Not speaking for the entire community, but I am sure that such a > contribution would (at minimum) be appreciated for review, consideration and > potential inclusion. The best thing would be ideally hosting the source > code somewhere that the rest of the community could go to for review. Maybe > you could host the GetFileData and PutFileData processors on a GitHub > repository somewhere? > > I think the idea you proposed is good, but might need to be aligned with the > work (if any) for the referenced ListFile and FetchFile implementation. And > the differences in your PutFileData vs. PutFile would ideally be well vetted > as well. > > Adam > > > > > > > > On Wed, Sep 23, 2015 at 2:23 PM, Rick Braddy <rbra...@softnas.com> wrote: > > We have already developed modified a modified GetFIle called GetFileData > that takes an incoming FlowFile containing the path to the file/directory > that needs to be transferred. There is a corresponding PutFileData on the > other side that accepts the incoming file/directory that creates the > directory/tree as needed or writes the file, then sets the permissions and > ownership. GetFileData also receives a file.rootdir attribute that gets > passed along to PutFileData, so it can rebase the original file’s location > relative to the configured target directory. Unlike GetFile/PutFile, these > processor work with entire directory trees and are triggered by incoming > FlowFiles to GetFileData. > > > > Eventually, we want to further enhance these two processors so they can > break large files into “chunks” and send as multi-part files that get > reassembled by PutFileData, resolving the limitations associated with huge > files and content repository size; e.g., there are default 100MB chunk > threshold and 10MB chunk size properties that will control the chunking, if > enabled. > > > > If the community is interested would benefit from these processors, we’re > happy to consider further generalizing and contributing these processors, > along with any further refinements based upon community review and feedback. > > > > I believe these processors would address both the Jira and David’s original > inquiry. > > > > Rick > > > > From: Adam Taft [mailto:a...@adamtaft.com] > Sent: Wednesday, September 23, 2015 1:09 PM > To: users@nifi.apache.org > Subject: Re: Generate flowfiles from flowfile content > > > > Right. This would be the use case that FetchFile [1] would help solve. > > [1] https://issues.apache.org/jira/browse/NIFI-631 > > > > On Wed, Sep 23, 2015 at 1:11 PM, Bryan Bende <bbe...@gmail.com> wrote: > > Hi David, > > > > When you say "f
Re: Generate flowfiles from flowfile content
Bryan - you may be right that ExtractText will be the right play once splitjson is done doing its thing. Perhaps either will work. Maybe we can show either or. If the schema is fairly well known i'm thinking extract json would be the winner. thanks Joe On Wed, Sep 23, 2015 at 4:04 PM, Bryan Bende <bbe...@gmail.com> wrote: > Sorry I missed Joe's email while sending mine... I can put together a > template showing this. > > > On Wednesday, September 23, 2015, Bryan Bende <bbe...@gmail.com> wrote: >> >> David, >> >> Take a look at ExtractText, it is for pulling FlowFile content into >> attributes. I think that will do what you are looking for. >> >> -Bryan >> >> On Wednesday, September 23, 2015, David Klim <davidkl...@hotmail.com> >> wrote: >>> >>> Hello Bryan, >>> >>> I should have been more specific. What I am trying to do is to fetch >>> files from S3. I am using the GetSQS processor to get new object (files) >>> events, and each event is a json containing the list of new objects (files) >>> in the bucket. The output of the GetSQS is processed by SplitJson and I get >>> flowfiles containing one object key (filename) each. I need to feed this >>> into FetchS3Object to retrive the actual file, but FetchS3Object expects the >>> flowfile filename attribute (or any other) to be the filename. So I guess >>> the problem is moving the filename string from the flowfile content to some >>> attribute. >>> >>> If there is no other alternative, I will implement this processor. >>> >>> Thanks! >>> >>> >>> From: rbra...@softnas.com >>> To: users@nifi.apache.org >>> Subject: RE: Generate flowfiles from flowfile content >>> Date: Wed, 23 Sep 2015 19:59:21 + >>> >>> Good idea, Adam. >>> >>> >>> >>> I will post a separate review thread on the dev@ list to track comments. >>> >>> >>> >>> Here’s the repository link: https://github.com/rickbraddy/nifishare >>> >>> >>> >>> >>> >>> Thanks >>> >>> Rick >>> >>> >>> >>> From: Adam Taft [mailto:a...@adamtaft.com] >>> Sent: Wednesday, September 23, 2015 1:48 PM >>> To: users@nifi.apache.org >>> Subject: Re: Generate flowfiles from flowfile content >>> >>> >>> >>> Not speaking for the entire community, but I am sure that such a >>> contribution would (at minimum) be appreciated for review, consideration and >>> potential inclusion. The best thing would be ideally hosting the source >>> code somewhere that the rest of the community could go to for review. Maybe >>> you could host the GetFileData and PutFileData processors on a GitHub >>> repository somewhere? >>> >>> I think the idea you proposed is good, but might need to be aligned with >>> the work (if any) for the referenced ListFile and FetchFile implementation. >>> And the differences in your PutFileData vs. PutFile would ideally be well >>> vetted as well. >>> >>> Adam >>> >>> >>> >>> >>> >>> >>> >>> On Wed, Sep 23, 2015 at 2:23 PM, Rick Braddy <rbra...@softnas.com> wrote: >>> >>> We have already developed modified a modified GetFIle called GetFileData >>> that takes an incoming FlowFile containing the path to the file/directory >>> that needs to be transferred. There is a corresponding PutFileData on the >>> other side that accepts the incoming file/directory that creates the >>> directory/tree as needed or writes the file, then sets the permissions and >>> ownership. GetFileData also receives a file.rootdir attribute that gets >>> passed along to PutFileData, so it can rebase the original file’s location >>> relative to the configured target directory. Unlike GetFile/PutFile, these >>> processor work with entire directory trees and are triggered by incoming >>> FlowFiles to GetFileData. >>> >>> >>> >>> Eventually, we want to further enhance these two processors so they can >>> break large files into “chunks” and send as multi-part files that get >>> reassembled by PutFileData, resolving the limitations associated with huge >>> files and content repository size; e.g., there are default 100MB chunk >>> threshold and 10MB chunk size properties that will control the chunking, if >>&
Re: Generate flowfiles from flowfile content
Sorry I missed Joe's email while sending mine... I can put together a template showing this. On Wednesday, September 23, 2015, Bryan Bende <bbe...@gmail.com> wrote: > David, > > Take a look at ExtractText, it is for pulling FlowFile content into > attributes. I think that will do what you are looking for. > > -Bryan > > On Wednesday, September 23, 2015, David Klim <davidkl...@hotmail.com > <javascript:_e(%7B%7D,'cvml','davidkl...@hotmail.com');>> wrote: > >> Hello Bryan, >> >> I should have been more specific. What I am trying to do is to fetch >> files from S3. I am using the GetSQS processor to get new object (files) >> events, and each event is a json containing the list of new objects (files) >> in the bucket. The output of the GetSQS is processed by SplitJson and I get >> flowfiles containing one object key (filename) each. I need to feed this >> into FetchS3Object to retrive the actual file, but FetchS3Object expects >> the flowfile filename attribute (or any other) to be the filename. So I >> guess the problem is moving the filename string from the flowfile content >> to some attribute. >> >> If there is no other alternative, I will implement this processor. >> >> Thanks! >> >> -- >> From: rbra...@softnas.com >> To: users@nifi.apache.org >> Subject: RE: Generate flowfiles from flowfile content >> Date: Wed, 23 Sep 2015 19:59:21 + >> >> Good idea, Adam. >> >> >> >> I will post a separate review thread on the dev@ list to track comments. >> >> >> >> Here’s the repository link: https://github.com/rickbraddy/nifishare >> >> >> >> >> >> Thanks >> >> Rick >> >> >> >> *From:* Adam Taft [mailto:a...@adamtaft.com] >> *Sent:* Wednesday, September 23, 2015 1:48 PM >> *To:* users@nifi.apache.org >> *Subject:* Re: Generate flowfiles from flowfile content >> >> >> >> Not speaking for the entire community, but I am sure that such a >> contribution would (at minimum) be appreciated for review, consideration >> and potential inclusion. The best thing would be ideally hosting the >> source code somewhere that the rest of the community could go to for >> review. Maybe you could host the GetFileData and PutFileData processors on >> a GitHub repository somewhere? >> >> I think the idea you proposed is good, but might need to be aligned with >> the work (if any) for the referenced ListFile and FetchFile >> implementation. And the differences in your PutFileData vs. PutFile would >> ideally be well vetted as well. >> >> Adam >> >> >> >> >> >> >> >> On Wed, Sep 23, 2015 at 2:23 PM, Rick Braddy <rbra...@softnas.com> wrote: >> >> We have already developed modified a modified GetFIle called GetFileData >> that takes an incoming FlowFile containing the path to the file/directory >> that needs to be transferred. There is a corresponding PutFileData on the >> other side that accepts the incoming file/directory that creates the >> directory/tree as needed or writes the file, then sets the permissions and >> ownership. GetFileData also receives a file.rootdir attribute that gets >> passed along to PutFileData, so it can rebase the original file’s location >> relative to the configured target directory. Unlike GetFile/PutFile, these >> processor work with entire directory trees and are triggered by incoming >> FlowFiles to GetFileData. >> >> >> >> Eventually, we want to further enhance these two processors so they can >> break large files into “chunks” and send as multi-part files that get >> reassembled by PutFileData, resolving the limitations associated with huge >> files and content repository size; e.g., there are default 100MB chunk >> threshold and 10MB chunk size properties that will control the chunking, if >> enabled. >> >> >> >> If the community is interested would benefit from these processors, we’re >> happy to consider further generalizing and contributing these processors, >> along with any further refinements based upon community review and feedback. >> >> >> >> I believe these processors would address both the Jira and David’s >> original inquiry. >> >> >> >> Rick >> >> >> >> *From:* Adam Taft [mailto:a...@adamtaft.com] >> *Sent:* Wednesday, September 23, 2015 1:09 PM >> *To:* users@nifi.apache.org >> *Subject:* Re: Generate flowfiles from flowfile content &g