Re: ExecuteScript Processor - Control Flow
Koustav, subprocess.call will indeed wait for the shell script to exit, but I believe you need to clarify if the "run_sqoop_job.sh" script has logic to wait for the completion of the Sqoop job. Thanks, James On Mon, Aug 29, 2016 at 12:01 PM, koustav choudhuriwrote: > [image: Inline image 2] > > > This is the code that I am using to call the shell script . > > On Mon, Aug 29, 2016 at 4:15 AM, Nathamuni, Ramanujam > wrote: > >> I do have similar question – as I have the Execute script using python to >> run code and it produces the output file (/tmp/test.xml) but not sure how >> to use that file to next processor without using additional flow file using >> GetFile processor to get a file produced by python execute script. I am >> very new to NiFi. >> >> >> >> Following is need: >> >> >> >> 1. READ CSV file from HDFS >> >> 2. Execute python script – reads CSV file and produces XML output >> file – example /tmp/test.xml . >> >> 3. I need to process the /tmp/test.xml file using SplitXML >> processor >> >> 4. Put these into HDFS >> >> >> >> >> >> Thanks, >> >> Ram >> >> *From:* James Wing [mailto:jvw...@gmail.com] >> *Sent:* Monday, August 29, 2016 12:47 AM >> *To:* users@nifi.apache.org >> *Subject:* Re: ExecuteScript Processor - Control Flow >> >> >> >> Koustav, >> >> How are you running the Sqoop job? Can you share some code? Python is >> sequential by default, but your Sqoop job might run asynchronously. I >> believe the answer depends on your code (or library) not only starting the >> Sqoop job, but polling for it's status until it is complete. >> >> Thanks, >> >> James >> >> >> >> On Sun, Aug 28, 2016 at 8:24 PM, koustav choudhuri >> wrote: >> >> Hi All >> >> >> >> I have a python script running on a Nifi Server , whin in turn calls a >> Sqoop job on a different Server . The next step in the script is to use the >> flow file from the previous processor to continue to the next processor . >> >> >> >> So the python script is like : >> >> >> >> 1. call the sqoop job on server 2 >> >> 2. get the flow file from the session and continue >> >> >> >> >> >> Question : >> >> Will step 2 wait till Step1 completes ? >> >> Or , >> >> As soon as the Sqoop job gets initiated through Step 1 , Step 2 Executes >> irrespective of whether Step 1 completes or not . >> >> >> >> Could be a dumb question , still asking . >> >> >> >> >> >> >> >> >> * >> This e-mail may contain confidential or privileged information. >> If you are not the intended recipient, please notify the sender >> immediately and then delete it. >> >> TIAA >> * >> > >
Re: NiFi reference to process group
Gunjan We have a feature proposal for this. Def a good idea that will help simplify flow development. It makes for process groups as functions. No clear time table for when it might happen but check out the feature proposal and comment on it and join the discussion. Thanks Joe On Aug 29, 2016 10:02 PM, "Gunjan Dave"wrote: Hi Team, does NiFi currently have ability to reference a process group instead of making an actual connection. This would simplify the visual aspects of complex flows.
NiFi reference to process group
Hi Team, does NiFi currently have ability to reference a process group instead of making an actual connection. This would simplify the visual aspects of complex flows.
Re: Request for enhancement
Hi Joe, I dont seem to have access to nifi jira to create one. Can it be given? Or if not, could someone help raising it. On Tue, Aug 30, 2016, 7:49 AM Joe Percivallwrote: > - Moving users list to BCC > > Hello Gunjan, > > This seems like a good potential idea. The proper place to submit the > suggestion is through the Apache NiFi Jira[1]. It can more easily be > discussed and worked on there. > > [1] https://issues.apache.org/jira/browse/NIFI > > > Suggestions/ideas from users are always welcome! > > Joe > - - - - - - > Joseph Percivall > linkedin.com/in/Percivall > e: joeperciv...@yahoo.com > > > > On Tuesday, August 30, 2016 12:06 PM, Gunjan Dave < > gunjanpiyushd...@gmail.com> wrote: > > > > Seems like below didnot get delivered. > > > On Mon, Aug 29, 2016, 12:30 PM Gunjan Dave > wrote: > > > Hi Team, > > I would like to propose if the following enhacement if seen feasible can > > be incorporated in the provenance graph. > > > > Current graph only shows the type, rather i would like to suggest if we > > can actually put in the component name along with processor type. That > > would make the graph more unique to each flow and more visually > intuitive. > > > > just a suggestion, not mandatory. > > >
Re: Request for enhancement
- Moving users list to BCC Hello Gunjan, This seems like a good potential idea. The proper place to submit the suggestion is through the Apache NiFi Jira[1]. It can more easily be discussed and worked on there. [1] https://issues.apache.org/jira/browse/NIFI Suggestions/ideas from users are always welcome! Joe - - - - - - Joseph Percivall linkedin.com/in/Percivall e: joeperciv...@yahoo.com On Tuesday, August 30, 2016 12:06 PM, Gunjan Davewrote: Seems like below didnot get delivered. On Mon, Aug 29, 2016, 12:30 PM Gunjan Dave wrote: > Hi Team, > I would like to propose if the following enhacement if seen feasible can > be incorporated in the provenance graph. > > Current graph only shows the type, rather i would like to suggest if we > can actually put in the component name along with processor type. That > would make the graph more unique to each flow and more visually intuitive. > > just a suggestion, not mandatory. >
Re: Request for enhancement
Seems like below didnot get delivered. On Mon, Aug 29, 2016, 12:30 PM Gunjan Davewrote: > Hi Team, > I would like to propose if the following enhacement if seen feasible can > be incorporated in the provenance graph. > > Current graph only shows the type, rather i would like to suggest if we > can actually put in the component name along with processor type. That > would make the graph more unique to each flow and more visually intuitive. > > just a suggestion, not mandatory. >
Re: ExecuteScript Processor - Control Flow
[image: Inline image 2] This is the code that I am using to call the shell script . On Mon, Aug 29, 2016 at 4:15 AM, Nathamuni, Ramanujamwrote: > I do have similar question – as I have the Execute script using python to > run code and it produces the output file (/tmp/test.xml) but not sure how > to use that file to next processor without using additional flow file using > GetFile processor to get a file produced by python execute script. I am > very new to NiFi. > > > > Following is need: > > > > 1. READ CSV file from HDFS > > 2. Execute python script – reads CSV file and produces XML output > file – example /tmp/test.xml . > > 3. I need to process the /tmp/test.xml file using SplitXML > processor > > 4. Put these into HDFS > > > > > > Thanks, > > Ram > > *From:* James Wing [mailto:jvw...@gmail.com] > *Sent:* Monday, August 29, 2016 12:47 AM > *To:* users@nifi.apache.org > *Subject:* Re: ExecuteScript Processor - Control Flow > > > > Koustav, > > How are you running the Sqoop job? Can you share some code? Python is > sequential by default, but your Sqoop job might run asynchronously. I > believe the answer depends on your code (or library) not only starting the > Sqoop job, but polling for it's status until it is complete. > > Thanks, > > James > > > > On Sun, Aug 28, 2016 at 8:24 PM, koustav choudhuri > wrote: > > Hi All > > > > I have a python script running on a Nifi Server , whin in turn calls a > Sqoop job on a different Server . The next step in the script is to use the > flow file from the previous processor to continue to the next processor . > > > > So the python script is like : > > > > 1. call the sqoop job on server 2 > > 2. get the flow file from the session and continue > > > > > > Question : > > Will step 2 wait till Step1 completes ? > > Or , > > As soon as the Sqoop job gets initiated through Step 1 , Step 2 Executes > irrespective of whether Step 1 completes or not . > > > > Could be a dumb question , still asking . > > > > > > > > > * > This e-mail may contain confidential or privileged information. > If you are not the intended recipient, please notify the sender > immediately and then delete it. > > TIAA > * >
RE: Kill-and-Fill Pattern?
Toivo, I started down this path, but then came up with a broader solution (which I have not tested): 1. Do a normal JSONToSQL 2. Use MergeContent to group all of the FlowFiles from the same batch into a single new FlowFile using FlowFile Stream Merge Format. 3. Update PutSQL to support Merged FlowFiles. --Peter From: Toivo Adams [mailto:toivo.ad...@gmail.com] Sent: Sunday, August 28, 2016 7:27 AM To: users@nifi.apache.org Subject: Re: Kill-and-Fill Pattern? hi Could new processor PutAvroSQL help? Processor will use data in Avro format and insert all records at once. thanks toivo 2016-08-26 16:45 GMT+03:00 Peter Wicks (pwicks)>: I have a source SQL table that I’m reading with a SQL select statement. I want to kill and fill a destination SQL table with this source data on an interval. My non kill-and-fill pattern is: ExecuteSQL -> Avro To JSON -> JSON To SQL -> PutSQL. I’m trying to come up with a good way to delete existing data first before loading new data. One option I’ve considered is to mark the original Avro file with a UUID and add this attribute as a field in the destination table; then do a split off, ReplaceText, and delete all rows where the UUID doesn’t match this batch. I think this could work, but I’m worried about timing the SQL DELETE. I kind of want the kill and the fill steps to happen in a single transaction. The other issue is what happens if PutSQL has to go down for a while due to database downtime and I get several kill-and-fill batches piled up. Is there a way I can use backpressure to make sure only a single file gets converted from JSON to SQL at a time in order to avoid mixing batches? I also considered FlowFile expiration, but is there a way I can tell it NiFI to only expire a FlowFile when a new FlowFile has entered the queue? Ex: 1 flow file in queue, no expiration occurs. 2nd (newer) FlowFile enters queue then first file will expire itself. Thanks, Peter
RE: ExecuteScript Processor - Control Flow
I do have similar question – as I have the Execute script using python to run code and it produces the output file (/tmp/test.xml) but not sure how to use that file to next processor without using additional flow file using GetFile processor to get a file produced by python execute script. I am very new to NiFi. Following is need: 1. READ CSV file from HDFS 2. Execute python script – reads CSV file and produces XML output file – example /tmp/test.xml . 3. I need to process the /tmp/test.xml file using SplitXML processor 4. Put these into HDFS Thanks, Ram From: James Wing [mailto:jvw...@gmail.com] Sent: Monday, August 29, 2016 12:47 AM To: users@nifi.apache.org Subject: Re: ExecuteScript Processor - Control Flow Koustav, How are you running the Sqoop job? Can you share some code? Python is sequential by default, but your Sqoop job might run asynchronously. I believe the answer depends on your code (or library) not only starting the Sqoop job, but polling for it's status until it is complete. Thanks, James On Sun, Aug 28, 2016 at 8:24 PM, koustav choudhuri> wrote: Hi All I have a python script running on a Nifi Server , whin in turn calls a Sqoop job on a different Server . The next step in the script is to use the flow file from the previous processor to continue to the next processor . So the python script is like : 1. call the sqoop job on server 2 2. get the flow file from the session and continue Question : Will step 2 wait till Step1 completes ? Or , As soon as the Sqoop job gets initiated through Step 1 , Step 2 Executes irrespective of whether Step 1 completes or not . Could be a dumb question , still asking . * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA *
Re: Configuration Management of Flows - Proposed Book of Work
Thanks Joe, i'll try to digest the information. In mean time, could you kindly help if there is any specific jira's or links in terms of what has improved in NiFi 1.0 for using templates with version control and diff tool, that alone at the moment should solve lot of problems, i guess. In the meantime will also wait for many more great features to come in w.r.t to configuration management of flows. On Mon, Aug 29, 2016, 1:05 PM Joe Wittwrote: > Gunjan > > We've long since supported the concept of flow templates. These are > powerful because they allow you to save, share, import already > designed flows. They have to date had their full potential limited by > a few things: > > 1) The exported template was non-deterministic in terms of the XML it > produced. This made using typical version control and diff tools very > difficult in terms of being able to quickly assess what has changed. > This was addressed in the upcoming Apache NiFi 1.0 which is under > vote. > > 2) The templates were too coupled to specific systems such as if you > had a URL for a database then that URL was in the template which made > the template less portable to say a dev environment which has a > different URL. This too has been improved in the upcoming version as > it now supports a variable registry and for properties which take > advantage of expression language statements they can now take > advantage of variable registry entries. We intend to do more there > too [1]. > > 3) We also need to provide a registry to make saving and sharing these > templates easier than it is today [2]. This will allow centralized > registries of templates that organizations can share between NiFi > clusters and do things like support typical SDLC models. There are > many more things we can do with this connecting templates to the > extensions they reference and supporting things like multiple versions > of those extensions [3] > > Finally, regarding configuration management in general there is no > reason why at this point we cannot simply always create a commit/diff > of the flow as changes are made and have them stored in a version > control system. We can then do things like tag data against the > version of the flow it ran through and rollback to a given > configuration state [4]. > > Hopefully this helps give you a sense of the ideas, progress, and > discussions that have occurred to date. > > > [1] https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry > [2] > https://cwiki.apache.org/confluence/display/NIFI/Extension+Repositories+%28aka+Extension+Registry%29+for+Dynamically-loaded+Extensions > [3] > https://cwiki.apache.org/confluence/display/NIFI/Multiple+Versions+of+the+Same+Extension > [4] > https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows > > > Thanks > Joe > > On Sun, Aug 28, 2016 at 11:48 PM, Gunjan Dave > wrote: > > Hello NiFi Team, > > I understand that config mgmt of flows is part of the proposed road map. > > Is there any tangible action which has started on this front? Any > tentative > > release plan even if very preliminary? > > Any plans to integrate with Git internally within the framework itself to > > version manage? > > This will soon become a differentiating factor for choosing NiFi over > other > > products. > > > > >
Re: Configuration Management of Flows - Proposed Book of Work
Gunjan We've long since supported the concept of flow templates. These are powerful because they allow you to save, share, import already designed flows. They have to date had their full potential limited by a few things: 1) The exported template was non-deterministic in terms of the XML it produced. This made using typical version control and diff tools very difficult in terms of being able to quickly assess what has changed. This was addressed in the upcoming Apache NiFi 1.0 which is under vote. 2) The templates were too coupled to specific systems such as if you had a URL for a database then that URL was in the template which made the template less portable to say a dev environment which has a different URL. This too has been improved in the upcoming version as it now supports a variable registry and for properties which take advantage of expression language statements they can now take advantage of variable registry entries. We intend to do more there too [1]. 3) We also need to provide a registry to make saving and sharing these templates easier than it is today [2]. This will allow centralized registries of templates that organizations can share between NiFi clusters and do things like support typical SDLC models. There are many more things we can do with this connecting templates to the extensions they reference and supporting things like multiple versions of those extensions [3] Finally, regarding configuration management in general there is no reason why at this point we cannot simply always create a commit/diff of the flow as changes are made and have them stored in a version control system. We can then do things like tag data against the version of the flow it ran through and rollback to a given configuration state [4]. Hopefully this helps give you a sense of the ideas, progress, and discussions that have occurred to date. [1] https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry [2] https://cwiki.apache.org/confluence/display/NIFI/Extension+Repositories+%28aka+Extension+Registry%29+for+Dynamically-loaded+Extensions [3] https://cwiki.apache.org/confluence/display/NIFI/Multiple+Versions+of+the+Same+Extension [4] https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows Thanks Joe On Sun, Aug 28, 2016 at 11:48 PM, Gunjan Davewrote: > Hello NiFi Team, > I understand that config mgmt of flows is part of the proposed road map. > Is there any tangible action which has started on this front? Any tentative > release plan even if very preliminary? > Any plans to integrate with Git internally within the framework itself to > version manage? > This will soon become a differentiating factor for choosing NiFi over other > products. > >
Configuration Management of Flows - Proposed Book of Work
Hello NiFi Team, I understand that config mgmt of flows is part of the proposed road map. Is there any tangible action which has started on this front? Any tentative release plan even if very preliminary? Any plans to integrate with Git internally within the framework itself to version manage? This will soon become a differentiating factor for choosing NiFi over other products.