RE: Generate flowfiles from flowfile content

2015-09-24 Thread David Klim
ExtractText did the job! Thank you very much! :-)

> Date: Wed, 23 Sep 2015 16:05:44 -0700
> Subject: Re: Generate flowfiles from flowfile content
> From: joe.w...@gmail.com
> To: users@nifi.apache.org
> 
> Bryan - you may be right that ExtractText will be the right play once
> splitjson is done doing its thing.  Perhaps either will work.  Maybe
> we can show either or.  If the schema is fairly well known i'm
> thinking extract json would be the winner.
> 
> thanks
> Joe
> 
> On Wed, Sep 23, 2015 at 4:04 PM, Bryan Bende <bbe...@gmail.com> wrote:
> > Sorry I missed Joe's email while sending mine... I can put together a
> > template showing this.
> >
> >
> > On Wednesday, September 23, 2015, Bryan Bende <bbe...@gmail.com> wrote:
> >>
> >> David,
> >>
> >> Take a look at ExtractText, it is for pulling FlowFile content into
> >> attributes. I think that will do what you are looking for.
> >>
> >> -Bryan
> >>
> >> On Wednesday, September 23, 2015, David Klim <davidkl...@hotmail.com>
> >> wrote:
> >>>
> >>> Hello Bryan,
> >>>
> >>> I should have been more specific. What I am trying to do is to fetch
> >>> files from S3. I am using the GetSQS processor to get new object (files)
> >>> events, and each event is a json containing the list of new objects 
> >>> (files)
> >>> in the bucket. The output of the GetSQS is processed by SplitJson and I 
> >>> get
> >>> flowfiles containing one object key (filename) each. I need to feed this
> >>> into FetchS3Object to retrive the actual file, but FetchS3Object expects 
> >>> the
> >>> flowfile filename attribute (or any other) to be the filename. So I guess
> >>> the problem is moving the filename string from the flowfile content to 
> >>> some
> >>> attribute.
> >>>
> >>> If there is no other alternative, I will implement this processor.
> >>>
> >>> Thanks!
> >>>
> >>> 
> >>> From: rbra...@softnas.com
> >>> To: users@nifi.apache.org
> >>> Subject: RE: Generate flowfiles from flowfile content
> >>> Date: Wed, 23 Sep 2015 19:59:21 +
> >>>
> >>> Good idea, Adam.
> >>>
> >>>
> >>>
> >>> I will post a separate review thread on the dev@ list to track comments.
> >>>
> >>>
> >>>
> >>> Here’s the repository link:  https://github.com/rickbraddy/nifishare
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Thanks
> >>>
> >>> Rick
> >>>
> >>>
> >>>
> >>> From: Adam Taft [mailto:a...@adamtaft.com]
> >>> Sent: Wednesday, September 23, 2015 1:48 PM
> >>> To: users@nifi.apache.org
> >>> Subject: Re: Generate flowfiles from flowfile content
> >>>
> >>>
> >>>
> >>> Not speaking for the entire community, but I am sure that such a
> >>> contribution would (at minimum) be appreciated for review, consideration 
> >>> and
> >>> potential inclusion.  The best thing would be ideally hosting the source
> >>> code somewhere that the rest of the community could go to for review.  
> >>> Maybe
> >>> you could host the GetFileData and PutFileData processors on a GitHub
> >>> repository somewhere?
> >>>
> >>> I think the idea you proposed is good, but might need to be aligned with
> >>> the work (if any) for the referenced ListFile and FetchFile 
> >>> implementation.
> >>> And the differences in your PutFileData vs. PutFile would ideally be well
> >>> vetted as well.
> >>>
> >>> Adam
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Sep 23, 2015 at 2:23 PM, Rick Braddy <rbra...@softnas.com> wrote:
> >>>
> >>> We have already developed modified a modified GetFIle called GetFileData
> >>> that takes an incoming FlowFile containing the path to the file/directory
> >>> that needs to be transferred.  There is a corresponding PutFileData on the
> >>> other side that accepts the incoming file/directory that creates the
> >>> directory/tree as needed or writes the file, then sets the permissions and
> >&g

RE: Generate flowfiles from flowfile content

2015-09-23 Thread Rick Braddy
We have already developed modified a modified GetFIle called GetFileData that 
takes an incoming FlowFile containing the path to the file/directory that needs 
to be transferred.  There is a corresponding PutFileData on the other side that 
accepts the incoming file/directory that creates the directory/tree as needed 
or writes the file, then sets the permissions and ownership.  GetFileData also 
receives a file.rootdir attribute that gets passed along to PutFileData, so it 
can rebase the original file’s location relative to the configured target 
directory.  Unlike GetFile/PutFile, these processor work with entire directory 
trees and are triggered by incoming FlowFiles to GetFileData.

Eventually, we want to further enhance these two processors so they can break 
large files into “chunks” and send as multi-part files that get reassembled by 
PutFileData, resolving the limitations associated with huge files and content 
repository size; e.g., there are default 100MB chunk threshold and 10MB chunk 
size properties that will control the chunking, if enabled.

If the community is interested would benefit from these processors, we’re happy 
to consider further generalizing and contributing these processors, along with 
any further refinements based upon community review and feedback.

I believe these processors would address both the Jira and David’s original 
inquiry.

Rick

From: Adam Taft [mailto:a...@adamtaft.com]
Sent: Wednesday, September 23, 2015 1:09 PM
To: users@nifi.apache.org
Subject: Re: Generate flowfiles from flowfile content

Right.  This would be the use case that FetchFile [1] would help solve.

[1] https://issues.apache.org/jira/browse/NIFI-631

On Wed, Sep 23, 2015 at 1:11 PM, Bryan Bende 
<bbe...@gmail.com<mailto:bbe...@gmail.com>> wrote:
Hi David,

When you say "files I need to retrieve", are you referring to files on the 
local filesystem where NiFi is running?

If so, I am not aware of an existing processor that does that. Currently we 
have GetFile which polls a directory, but that is not what you want here.

It would be fairly straight forward to implement with a custom processor 
though... You would read the incoming FlowFile content to get the filename, 
then create a new FlowFile with your desired name, and write the content of the 
local file to the new FlowFile.

-Bryan


On Wed, Sep 23, 2015 at 11:16 AM, David Klim 
<davidkl...@hotmail.com<mailto:davidkl...@hotmail.com>> wrote:
Hello,

In a flow I am defining, I receive a flowfile containing json string. Using the 
splitJson processor I can extract some json paths pointing to some files I need 
to retrieve, but the filename is the content of the generated flowfile. So I 
would need to be able to read the content and generate a flowfile with that 
name instead. How could I do that?

Thanks!





Re: Generate flowfiles from flowfile content

2015-09-23 Thread Joe Witt
David,

I think if i read your case correctly this should be supported really
well.  The flow would be something like:

GetSQS -> SplitJson -> EvaluateJsonPath -> FetchS3Object

In SplitJSON you'll break apart the original object into smaller valid
JSON objects.

In evaluate JsonPath you'll promote the filename/url you need from the
JSON content to flow file attributes

In FetchS3 you'll go grab the item based on the name/url you pulled in
evaluate json path.

Bryan: Any chance you could put together a quick template for David to
check out?

Thanks
Joe

On Wed, Sep 23, 2015 at 3:41 PM, David Klim <davidkl...@hotmail.com> wrote:
> Hello Bryan,
>
> I should have been more specific. What I am trying to do is to fetch files
> from S3. I am using the GetSQS processor to get new object (files) events,
> and each event is a json containing the list of new objects (files) in the
> bucket. The output of the GetSQS is processed by SplitJson and I get
> flowfiles containing one object key (filename) each. I need to feed this
> into FetchS3Object to retrive the actual file, but FetchS3Object expects the
> flowfile filename attribute (or any other) to be the filename. So I guess
> the problem is moving the filename string from the flowfile content to some
> attribute.
>
> If there is no other alternative, I will implement this processor.
>
> Thanks!
>
> 
> From: rbra...@softnas.com
> To: users@nifi.apache.org
> Subject: RE: Generate flowfiles from flowfile content
> Date: Wed, 23 Sep 2015 19:59:21 +
>
>
> Good idea, Adam.
>
>
>
> I will post a separate review thread on the dev@ list to track comments.
>
>
>
> Here’s the repository link:  https://github.com/rickbraddy/nifishare
>
>
>
>
>
> Thanks
>
> Rick
>
>
>
> From: Adam Taft [mailto:a...@adamtaft.com]
> Sent: Wednesday, September 23, 2015 1:48 PM
> To: users@nifi.apache.org
> Subject: Re: Generate flowfiles from flowfile content
>
>
>
> Not speaking for the entire community, but I am sure that such a
> contribution would (at minimum) be appreciated for review, consideration and
> potential inclusion.  The best thing would be ideally hosting the source
> code somewhere that the rest of the community could go to for review.  Maybe
> you could host the GetFileData and PutFileData processors on a GitHub
> repository somewhere?
>
> I think the idea you proposed is good, but might need to be aligned with the
> work (if any) for the referenced ListFile and FetchFile implementation.  And
> the differences in your PutFileData vs. PutFile would ideally be well vetted
> as well.
>
> Adam
>
>
>
>
>
>
>
> On Wed, Sep 23, 2015 at 2:23 PM, Rick Braddy <rbra...@softnas.com> wrote:
>
> We have already developed modified a modified GetFIle called GetFileData
> that takes an incoming FlowFile containing the path to the file/directory
> that needs to be transferred.  There is a corresponding PutFileData on the
> other side that accepts the incoming file/directory that creates the
> directory/tree as needed or writes the file, then sets the permissions and
> ownership.  GetFileData also receives a file.rootdir attribute that gets
> passed along to PutFileData, so it can rebase the original file’s location
> relative to the configured target directory.  Unlike GetFile/PutFile, these
> processor work with entire directory trees and are triggered by incoming
> FlowFiles to GetFileData.
>
>
>
> Eventually, we want to further enhance these two processors so they can
> break large files into “chunks” and send as multi-part files that get
> reassembled by PutFileData, resolving the limitations associated with huge
> files and content repository size; e.g., there are default 100MB chunk
> threshold and 10MB chunk size properties that will control the chunking, if
> enabled.
>
>
>
> If the community is interested would benefit from these processors, we’re
> happy to consider further generalizing and contributing these processors,
> along with any further refinements based upon community review and feedback.
>
>
>
> I believe these processors would address both the Jira and David’s original
> inquiry.
>
>
>
> Rick
>
>
>
> From: Adam Taft [mailto:a...@adamtaft.com]
> Sent: Wednesday, September 23, 2015 1:09 PM
> To: users@nifi.apache.org
> Subject: Re: Generate flowfiles from flowfile content
>
>
>
> Right.  This would be the use case that FetchFile [1] would help solve.
>
> [1] https://issues.apache.org/jira/browse/NIFI-631
>
>
>
> On Wed, Sep 23, 2015 at 1:11 PM, Bryan Bende <bbe...@gmail.com> wrote:
>
> Hi David,
>
>
>
> When you say "f

Re: Generate flowfiles from flowfile content

2015-09-23 Thread Joe Witt
Bryan - you may be right that ExtractText will be the right play once
splitjson is done doing its thing.  Perhaps either will work.  Maybe
we can show either or.  If the schema is fairly well known i'm
thinking extract json would be the winner.

thanks
Joe

On Wed, Sep 23, 2015 at 4:04 PM, Bryan Bende <bbe...@gmail.com> wrote:
> Sorry I missed Joe's email while sending mine... I can put together a
> template showing this.
>
>
> On Wednesday, September 23, 2015, Bryan Bende <bbe...@gmail.com> wrote:
>>
>> David,
>>
>> Take a look at ExtractText, it is for pulling FlowFile content into
>> attributes. I think that will do what you are looking for.
>>
>> -Bryan
>>
>> On Wednesday, September 23, 2015, David Klim <davidkl...@hotmail.com>
>> wrote:
>>>
>>> Hello Bryan,
>>>
>>> I should have been more specific. What I am trying to do is to fetch
>>> files from S3. I am using the GetSQS processor to get new object (files)
>>> events, and each event is a json containing the list of new objects (files)
>>> in the bucket. The output of the GetSQS is processed by SplitJson and I get
>>> flowfiles containing one object key (filename) each. I need to feed this
>>> into FetchS3Object to retrive the actual file, but FetchS3Object expects the
>>> flowfile filename attribute (or any other) to be the filename. So I guess
>>> the problem is moving the filename string from the flowfile content to some
>>> attribute.
>>>
>>> If there is no other alternative, I will implement this processor.
>>>
>>> Thanks!
>>>
>>> 
>>> From: rbra...@softnas.com
>>> To: users@nifi.apache.org
>>> Subject: RE: Generate flowfiles from flowfile content
>>> Date: Wed, 23 Sep 2015 19:59:21 +
>>>
>>> Good idea, Adam.
>>>
>>>
>>>
>>> I will post a separate review thread on the dev@ list to track comments.
>>>
>>>
>>>
>>> Here’s the repository link:  https://github.com/rickbraddy/nifishare
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>>
>>> Rick
>>>
>>>
>>>
>>> From: Adam Taft [mailto:a...@adamtaft.com]
>>> Sent: Wednesday, September 23, 2015 1:48 PM
>>> To: users@nifi.apache.org
>>> Subject: Re: Generate flowfiles from flowfile content
>>>
>>>
>>>
>>> Not speaking for the entire community, but I am sure that such a
>>> contribution would (at minimum) be appreciated for review, consideration and
>>> potential inclusion.  The best thing would be ideally hosting the source
>>> code somewhere that the rest of the community could go to for review.  Maybe
>>> you could host the GetFileData and PutFileData processors on a GitHub
>>> repository somewhere?
>>>
>>> I think the idea you proposed is good, but might need to be aligned with
>>> the work (if any) for the referenced ListFile and FetchFile implementation.
>>> And the differences in your PutFileData vs. PutFile would ideally be well
>>> vetted as well.
>>>
>>> Adam
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Sep 23, 2015 at 2:23 PM, Rick Braddy <rbra...@softnas.com> wrote:
>>>
>>> We have already developed modified a modified GetFIle called GetFileData
>>> that takes an incoming FlowFile containing the path to the file/directory
>>> that needs to be transferred.  There is a corresponding PutFileData on the
>>> other side that accepts the incoming file/directory that creates the
>>> directory/tree as needed or writes the file, then sets the permissions and
>>> ownership.  GetFileData also receives a file.rootdir attribute that gets
>>> passed along to PutFileData, so it can rebase the original file’s location
>>> relative to the configured target directory.  Unlike GetFile/PutFile, these
>>> processor work with entire directory trees and are triggered by incoming
>>> FlowFiles to GetFileData.
>>>
>>>
>>>
>>> Eventually, we want to further enhance these two processors so they can
>>> break large files into “chunks” and send as multi-part files that get
>>> reassembled by PutFileData, resolving the limitations associated with huge
>>> files and content repository size; e.g., there are default 100MB chunk
>>> threshold and 10MB chunk size properties that will control the chunking, if
>>&

Re: Generate flowfiles from flowfile content

2015-09-23 Thread Bryan Bende
Sorry I missed Joe's email while sending mine... I can put together a
template showing this.

On Wednesday, September 23, 2015, Bryan Bende <bbe...@gmail.com> wrote:

> David,
>
> Take a look at ExtractText, it is for pulling FlowFile content into
> attributes. I think that will do what you are looking for.
>
> -Bryan
>
> On Wednesday, September 23, 2015, David Klim <davidkl...@hotmail.com
> <javascript:_e(%7B%7D,'cvml','davidkl...@hotmail.com');>> wrote:
>
>> Hello Bryan,
>>
>> I should have been more specific. What I am trying to do is to fetch
>> files from S3. I am using the GetSQS processor to get new object (files)
>> events, and each event is a json containing the list of new objects (files)
>> in the bucket. The output of the GetSQS is processed by SplitJson and I get
>> flowfiles containing one object key (filename) each. I need to feed this
>> into FetchS3Object to retrive the actual file, but FetchS3Object expects
>> the flowfile filename attribute (or any other) to be the filename. So I
>> guess the problem is moving the filename string from the flowfile content
>> to some attribute.
>>
>> If there is no other alternative, I will implement this processor.
>>
>> Thanks!
>>
>> --
>> From: rbra...@softnas.com
>> To: users@nifi.apache.org
>> Subject: RE: Generate flowfiles from flowfile content
>> Date: Wed, 23 Sep 2015 19:59:21 +
>>
>> Good idea, Adam.
>>
>>
>>
>> I will post a separate review thread on the dev@ list to track comments.
>>
>>
>>
>> Here’s the repository link:  https://github.com/rickbraddy/nifishare
>>
>>
>>
>>
>>
>> Thanks
>>
>> Rick
>>
>>
>>
>> *From:* Adam Taft [mailto:a...@adamtaft.com]
>> *Sent:* Wednesday, September 23, 2015 1:48 PM
>> *To:* users@nifi.apache.org
>> *Subject:* Re: Generate flowfiles from flowfile content
>>
>>
>>
>> Not speaking for the entire community, but I am sure that such a
>> contribution would (at minimum) be appreciated for review, consideration
>> and potential inclusion.  The best thing would be ideally hosting the
>> source code somewhere that the rest of the community could go to for
>> review.  Maybe you could host the GetFileData and PutFileData processors on
>> a GitHub repository somewhere?
>>
>> I think the idea you proposed is good, but might need to be aligned with
>> the work (if any) for the referenced ListFile and FetchFile
>> implementation.  And the differences in your PutFileData vs. PutFile would
>> ideally be well vetted as well.
>>
>> Adam
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Sep 23, 2015 at 2:23 PM, Rick Braddy <rbra...@softnas.com> wrote:
>>
>> We have already developed modified a modified GetFIle called GetFileData
>> that takes an incoming FlowFile containing the path to the file/directory
>> that needs to be transferred.  There is a corresponding PutFileData on the
>> other side that accepts the incoming file/directory that creates the
>> directory/tree as needed or writes the file, then sets the permissions and
>> ownership.  GetFileData also receives a file.rootdir attribute that gets
>> passed along to PutFileData, so it can rebase the original file’s location
>> relative to the configured target directory.  Unlike GetFile/PutFile, these
>> processor work with entire directory trees and are triggered by incoming
>> FlowFiles to GetFileData.
>>
>>
>>
>> Eventually, we want to further enhance these two processors so they can
>> break large files into “chunks” and send as multi-part files that get
>> reassembled by PutFileData, resolving the limitations associated with huge
>> files and content repository size; e.g., there are default 100MB chunk
>> threshold and 10MB chunk size properties that will control the chunking, if
>> enabled.
>>
>>
>>
>> If the community is interested would benefit from these processors, we’re
>> happy to consider further generalizing and contributing these processors,
>> along with any further refinements based upon community review and feedback.
>>
>>
>>
>> I believe these processors would address both the Jira and David’s
>> original inquiry.
>>
>>
>>
>> Rick
>>
>>
>>
>> *From:* Adam Taft [mailto:a...@adamtaft.com]
>> *Sent:* Wednesday, September 23, 2015 1:09 PM
>> *To:* users@nifi.apache.org
>> *Subject:* Re: Generate flowfiles from flowfile content
&g