Re: ExecuteScript Processor - Control Flow

2016-08-29 Thread James Wing
Koustav,

subprocess.call will indeed wait for the shell script to exit, but I
believe you need to clarify if the "run_sqoop_job.sh" script has logic to
wait for the completion of the Sqoop job.

Thanks,

James


On Mon, Aug 29, 2016 at 12:01 PM, koustav choudhuri 
wrote:

> [image: Inline image 2]
>
>
> This is the code that I am using to call the shell script .
>
> On Mon, Aug 29, 2016 at 4:15 AM, Nathamuni, Ramanujam  > wrote:
>
>> I do have similar question – as I have the Execute script using python to
>> run code and it produces the output file (/tmp/test.xml)  but not sure how
>> to use that file to next processor without using additional flow file using
>>   GetFile processor to get a file produced by python execute script.  I am
>> very new to NiFi.
>>
>>
>>
>> Following is need:
>>
>>
>>
>> 1.   READ CSV file from HDFS
>>
>> 2.   Execute python script – reads CSV file and produces XML  output
>> file – example /tmp/test.xml .
>>
>> 3.  I need to process the /tmp/test.xml file using  SplitXML
>> processor
>>
>> 4.  Put these into HDFS
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Ram
>>
>> *From:* James Wing [mailto:jvw...@gmail.com]
>> *Sent:* Monday, August 29, 2016 12:47 AM
>> *To:* users@nifi.apache.org
>> *Subject:* Re: ExecuteScript Processor - Control Flow
>>
>>
>>
>> Koustav,
>>
>> How are you running the Sqoop job?  Can you share some code?  Python is
>> sequential by default, but your Sqoop job might run asynchronously.  I
>> believe the answer depends on your code (or library) not only starting the
>> Sqoop job, but polling for it's status until it is complete.
>>
>> Thanks,
>>
>> James
>>
>>
>>
>> On Sun, Aug 28, 2016 at 8:24 PM, koustav choudhuri 
>> wrote:
>>
>> Hi All
>>
>>
>>
>> I have a python script running on a Nifi Server , whin in turn calls a
>> Sqoop job on a different Server . The next step in the script is to use the
>> flow file from the previous processor to continue to the next processor .
>>
>>
>>
>> So the python script is like :
>>
>>
>>
>> 1. call the sqoop job on server 2
>>
>> 2. get the flow file from the session and continue
>>
>>
>>
>>
>>
>> Question  :
>>
>> Will step 2 wait till Step1 completes ?
>>
>> Or ,
>>
>> As soon as the Sqoop job gets initiated through Step 1 , Step 2 Executes
>> irrespective of whether Step 1 completes or not .
>>
>>
>>
>> Could be a dumb question , still asking .
>>
>>
>>
>>
>>
>>
>>
>>
>> *
>> This e-mail may contain confidential or privileged information.
>> If you are not the intended recipient, please notify the sender
>> immediately and then delete it.
>>
>> TIAA
>> *
>>
>
>


Re: NiFi reference to process group

2016-08-29 Thread Joe Witt
Gunjan

We have a feature proposal for this.  Def a good idea that will help
simplify flow development.  It makes for process groups as functions.  No
clear time table for when it might happen but check out the feature
proposal and comment on it and join the discussion.

Thanks
Joe

On Aug 29, 2016 10:02 PM, "Gunjan Dave"  wrote:

Hi Team, does NiFi currently have ability to reference a process group
instead of making an actual connection.
This would simplify the visual aspects of complex flows.


NiFi reference to process group

2016-08-29 Thread Gunjan Dave
Hi Team, does NiFi currently have ability to reference a process group
instead of making an actual connection.
This would simplify the visual aspects of complex flows.


Re: Request for enhancement

2016-08-29 Thread Gunjan Dave
Hi Joe, I dont seem to have access to nifi jira to create one. Can it be
given? Or if not, could someone help raising it.

On Tue, Aug 30, 2016, 7:49 AM Joe Percivall  wrote:

> - Moving users list to BCC
>
> Hello Gunjan,
>
> This seems like a good potential idea. The proper place to submit the
> suggestion is through the Apache NiFi Jira[1]. It can more easily be
> discussed and worked on there.
>
> [1] https://issues.apache.org/jira/browse/NIFI
>
>
> Suggestions/ideas from users are always welcome!
>
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joeperciv...@yahoo.com
>
>
>
> On Tuesday, August 30, 2016 12:06 PM, Gunjan Dave <
> gunjanpiyushd...@gmail.com> wrote:
>
>
>
> Seems like below didnot get delivered.
>
>
> On Mon, Aug 29, 2016, 12:30 PM Gunjan Dave 
> wrote:
>
> > Hi Team,
> > I would like to propose if the following enhacement if seen feasible can
> > be incorporated in the provenance graph.
> >
> > Current graph only shows the type, rather i would like to suggest if we
> > can actually put in the component name along with processor type. That
> > would make the graph more unique to each flow and more visually
> intuitive.
> >
> > just a suggestion, not mandatory.
> >
>


Re: Request for enhancement

2016-08-29 Thread Joe Percivall
- Moving users list to BCC

Hello Gunjan,

This seems like a good potential idea. The proper place to submit the 
suggestion is through the Apache NiFi Jira[1]. It can more easily be discussed 
and worked on there.

[1] https://issues.apache.org/jira/browse/NIFI


Suggestions/ideas from users are always welcome!

Joe 
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Tuesday, August 30, 2016 12:06 PM, Gunjan Dave  
wrote:



Seems like below didnot get delivered.


On Mon, Aug 29, 2016, 12:30 PM Gunjan Dave 
wrote:

> Hi Team,
> I would like to propose if the following enhacement if seen feasible can
> be incorporated in the provenance graph.
>
> Current graph only shows the type, rather i would like to suggest if we
> can actually put in the component name along with processor type. That
> would make the graph more unique to each flow and more visually intuitive.
>
> just a suggestion, not mandatory.
>


Re: Request for enhancement

2016-08-29 Thread Gunjan Dave
Seems like below didnot get delivered.

On Mon, Aug 29, 2016, 12:30 PM Gunjan Dave 
wrote:

> Hi Team,
> I would like to propose if the following enhacement if seen feasible can
> be incorporated in the provenance graph.
>
> Current graph only shows the type, rather i would like to suggest if we
> can actually put in the component name along with processor type. That
> would make the graph more unique to each flow and more visually intuitive.
>
> just a suggestion, not mandatory.
>


Re: ExecuteScript Processor - Control Flow

2016-08-29 Thread koustav choudhuri
[image: Inline image 2]


This is the code that I am using to call the shell script .

On Mon, Aug 29, 2016 at 4:15 AM, Nathamuni, Ramanujam 
wrote:

> I do have similar question – as I have the Execute script using python to
> run code and it produces the output file (/tmp/test.xml)  but not sure how
> to use that file to next processor without using additional flow file using
>   GetFile processor to get a file produced by python execute script.  I am
> very new to NiFi.
>
>
>
> Following is need:
>
>
>
> 1.   READ CSV file from HDFS
>
> 2.   Execute python script – reads CSV file and produces XML  output
> file – example /tmp/test.xml .
>
> 3.  I need to process the /tmp/test.xml file using  SplitXML
> processor
>
> 4.  Put these into HDFS
>
>
>
>
>
> Thanks,
>
> Ram
>
> *From:* James Wing [mailto:jvw...@gmail.com]
> *Sent:* Monday, August 29, 2016 12:47 AM
> *To:* users@nifi.apache.org
> *Subject:* Re: ExecuteScript Processor - Control Flow
>
>
>
> Koustav,
>
> How are you running the Sqoop job?  Can you share some code?  Python is
> sequential by default, but your Sqoop job might run asynchronously.  I
> believe the answer depends on your code (or library) not only starting the
> Sqoop job, but polling for it's status until it is complete.
>
> Thanks,
>
> James
>
>
>
> On Sun, Aug 28, 2016 at 8:24 PM, koustav choudhuri 
> wrote:
>
> Hi All
>
>
>
> I have a python script running on a Nifi Server , whin in turn calls a
> Sqoop job on a different Server . The next step in the script is to use the
> flow file from the previous processor to continue to the next processor .
>
>
>
> So the python script is like :
>
>
>
> 1. call the sqoop job on server 2
>
> 2. get the flow file from the session and continue
>
>
>
>
>
> Question  :
>
> Will step 2 wait till Step1 completes ?
>
> Or ,
>
> As soon as the Sqoop job gets initiated through Step 1 , Step 2 Executes
> irrespective of whether Step 1 completes or not .
>
>
>
> Could be a dumb question , still asking .
>
>
>
>
>
>
>
>
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender
> immediately and then delete it.
>
> TIAA
> *
>


RE: Kill-and-Fill Pattern?

2016-08-29 Thread Peter Wicks (pwicks)
Toivo,

I started down this path, but then came up with a broader solution (which I 
have not tested):


1.   Do a normal JSONToSQL

2.   Use MergeContent to group all of the FlowFiles from the same batch 
into a single new FlowFile using FlowFile Stream Merge Format.

3.   Update PutSQL to support Merged FlowFiles.

--Peter

From: Toivo Adams [mailto:toivo.ad...@gmail.com]
Sent: Sunday, August 28, 2016 7:27 AM
To: users@nifi.apache.org
Subject: Re: Kill-and-Fill Pattern?

hi
Could new processor PutAvroSQL help?
Processor will use data in Avro format and insert all records at once.
thanks
toivo

2016-08-26 16:45 GMT+03:00 Peter Wicks (pwicks) 
>:
I have a source SQL table that I’m reading with a SQL select statement.  I want 
to kill and fill a destination SQL table with this source data on an interval.

My non kill-and-fill pattern is: ExecuteSQL -> Avro To JSON -> JSON To SQL -> 
PutSQL.

I’m trying to come up with a good way to delete existing data first before 
loading new data.
One option I’ve considered is to mark the original Avro file with a UUID and 
add this attribute as a field in the destination table; then do a split off, 
ReplaceText, and delete all rows where the UUID doesn’t match this batch.  I 
think this could work, but I’m worried about timing the SQL DELETE.  I kind of 
want the kill and the fill steps to happen in a single transaction.

The other issue is what happens if PutSQL has to go down for a while due to 
database downtime and I get several kill-and-fill batches piled up.  Is there a 
way I can use backpressure to make sure only a single file gets converted from 
JSON to SQL at a time in order to avoid mixing batches?
I also considered FlowFile expiration, but is there a way I can tell it NiFI to 
only expire a FlowFile when a new FlowFile has entered the queue? Ex: 1 flow 
file in queue, no expiration occurs. 2nd (newer) FlowFile enters queue then 
first file will expire itself.

Thanks,
  Peter



RE: ExecuteScript Processor - Control Flow

2016-08-29 Thread Nathamuni, Ramanujam
I do have similar question – as I have the Execute script using python to run 
code and it produces the output file (/tmp/test.xml)  but not sure how to use 
that file to next processor without using additional flow file using   GetFile 
processor to get a file produced by python execute script.  I am very new to 
NiFi.

Following is need:


1.   READ CSV file from HDFS

2.   Execute python script – reads CSV file and produces XML  output file – 
example /tmp/test.xml .

3.  I need to process the /tmp/test.xml file using  SplitXML processor

4.  Put these into HDFS


Thanks,
Ram
From: James Wing [mailto:jvw...@gmail.com]
Sent: Monday, August 29, 2016 12:47 AM
To: users@nifi.apache.org
Subject: Re: ExecuteScript Processor - Control Flow

Koustav,
How are you running the Sqoop job?  Can you share some code?  Python is 
sequential by default, but your Sqoop job might run asynchronously.  I believe 
the answer depends on your code (or library) not only starting the Sqoop job, 
but polling for it's status until it is complete.
Thanks,
James

On Sun, Aug 28, 2016 at 8:24 PM, koustav choudhuri 
> wrote:
Hi All

I have a python script running on a Nifi Server , whin in turn calls a Sqoop 
job on a different Server . The next step in the script is to use the flow file 
from the previous processor to continue to the next processor .

So the python script is like :

1. call the sqoop job on server 2
2. get the flow file from the session and continue


Question  :
Will step 2 wait till Step1 completes ?
Or ,
As soon as the Sqoop job gets initiated through Step 1 , Step 2 Executes 
irrespective of whether Step 1 completes or not .

Could be a dumb question , still asking .



*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


Re: Configuration Management of Flows - Proposed Book of Work

2016-08-29 Thread Gunjan Dave
Thanks Joe, i'll try to digest the information.

In mean time, could you kindly help if there is any specific jira's or
links in terms of what has improved in NiFi 1.0 for using templates with
version control and diff tool, that alone at the moment should solve lot of
problems, i guess.

In the meantime will also wait for many more great features to come in
w.r.t to configuration management of flows.

On Mon, Aug 29, 2016, 1:05 PM Joe Witt  wrote:

> Gunjan
>
> We've long since supported the concept of flow templates.  These are
> powerful because they allow you to save, share, import already
> designed flows.  They have to date had their full potential limited by
> a few things:
>
> 1) The exported template was non-deterministic in terms of the XML it
> produced.  This made using typical version control and diff tools very
> difficult in terms of being able to quickly assess what has changed.
> This was addressed in the upcoming Apache NiFi 1.0 which is under
> vote.
>
> 2) The templates were too coupled to specific systems such as if you
> had a URL for a database then that URL was in the template which made
> the template less portable to say a dev environment which has a
> different URL.  This too has been improved in the upcoming version as
> it now supports a variable registry and for properties which take
> advantage of expression language statements they can now take
> advantage of variable registry entries.  We intend to do more there
> too [1].
>
> 3) We also need to provide a registry to make saving and sharing these
> templates easier than it is today [2].  This will allow centralized
> registries of templates that organizations can share between NiFi
> clusters and do things like support typical SDLC models.  There are
> many more things we can do with this connecting templates to the
> extensions they reference and supporting things like multiple versions
> of those extensions [3]
>
> Finally, regarding configuration management in general there is no
> reason why at this point we cannot simply always create a commit/diff
> of the flow as changes are made and have them stored in a version
> control system.  We can then do things like tag data against the
> version of the flow it ran through and rollback to a given
> configuration state [4].
>
> Hopefully this helps give you a sense of the ideas, progress, and
> discussions that have occurred to date.
>
>
> [1] https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry
> [2]
> https://cwiki.apache.org/confluence/display/NIFI/Extension+Repositories+%28aka+Extension+Registry%29+for+Dynamically-loaded+Extensions
> [3]
> https://cwiki.apache.org/confluence/display/NIFI/Multiple+Versions+of+the+Same+Extension
> [4]
> https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
>
>
> Thanks
> Joe
>
> On Sun, Aug 28, 2016 at 11:48 PM, Gunjan Dave
>  wrote:
> > Hello NiFi Team,
> > I understand that config mgmt of flows is part of the proposed road map.
> > Is there any tangible action which has started on this front? Any
> tentative
> > release plan even if very preliminary?
> > Any plans to integrate with Git internally within the framework itself to
> > version manage?
> > This will soon become a differentiating factor for choosing NiFi over
> other
> > products.
> >
> >
>


Re: Configuration Management of Flows - Proposed Book of Work

2016-08-29 Thread Joe Witt
Gunjan

We've long since supported the concept of flow templates.  These are
powerful because they allow you to save, share, import already
designed flows.  They have to date had their full potential limited by
a few things:

1) The exported template was non-deterministic in terms of the XML it
produced.  This made using typical version control and diff tools very
difficult in terms of being able to quickly assess what has changed.
This was addressed in the upcoming Apache NiFi 1.0 which is under
vote.

2) The templates were too coupled to specific systems such as if you
had a URL for a database then that URL was in the template which made
the template less portable to say a dev environment which has a
different URL.  This too has been improved in the upcoming version as
it now supports a variable registry and for properties which take
advantage of expression language statements they can now take
advantage of variable registry entries.  We intend to do more there
too [1].

3) We also need to provide a registry to make saving and sharing these
templates easier than it is today [2].  This will allow centralized
registries of templates that organizations can share between NiFi
clusters and do things like support typical SDLC models.  There are
many more things we can do with this connecting templates to the
extensions they reference and supporting things like multiple versions
of those extensions [3]

Finally, regarding configuration management in general there is no
reason why at this point we cannot simply always create a commit/diff
of the flow as changes are made and have them stored in a version
control system.  We can then do things like tag data against the
version of the flow it ran through and rollback to a given
configuration state [4].

Hopefully this helps give you a sense of the ideas, progress, and
discussions that have occurred to date.


[1] https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry
[2] 
https://cwiki.apache.org/confluence/display/NIFI/Extension+Repositories+%28aka+Extension+Registry%29+for+Dynamically-loaded+Extensions
[3] 
https://cwiki.apache.org/confluence/display/NIFI/Multiple+Versions+of+the+Same+Extension
[4] 
https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows


Thanks
Joe

On Sun, Aug 28, 2016 at 11:48 PM, Gunjan Dave
 wrote:
> Hello NiFi Team,
> I understand that config mgmt of flows is part of the proposed road map.
> Is there any tangible action which has started on this front? Any tentative
> release plan even if very preliminary?
> Any plans to integrate with Git internally within the framework itself to
> version manage?
> This will soon become a differentiating factor for choosing NiFi over other
> products.
>
>


Configuration Management of Flows - Proposed Book of Work

2016-08-29 Thread Gunjan Dave
Hello NiFi Team,
I understand that config mgmt of flows is part of the proposed road map.
Is there any tangible action which has started on this front? Any tentative
release plan even if very preliminary?
Any plans to integrate with Git internally within the framework itself to
version manage?
This will soon become a differentiating factor for choosing NiFi over other
products.