Re: A bag of groovy questions regarding the ExecuteScript processor

Giovanni Lanzani Tue, 10 Oct 2017 07:46:09 -0700

So, I came around and created a Python script (invoked from Gradle/yourfavorite build tool) that restarts all Groovy ExecuteScript processorswith a filled Module Folder:


https://gist.github.com/gglanzani/708364f4d5288844fc63d692ebe47b51

The script is still in its infancy, and some things are hard coded inwhich shouldn't. But I like the approach as it is completely decoupledfrom a particular ExecuteScript operation (and tightly coupled with thedeploy process).


Let me know what you think!

Currently it's Python > 3.5 (type hints). It works however withoutexternal modules. It should be trivial to adapt to 3.4 (probably besideshinting it works already).


Cheers,

Giovanni

On 4 Oct 2017, at 23:36, Giovanni Lanzani wrote:

Hi Matt,

Thanks for the answers.
session.get(N).each) Good to know, I thought a roll-back wasinevitable with uncatched exceptions;
ScriptTester) Since you're here: I've could only get the script todownload when adding this to the `repositories` in the `.build`
```
    maven {
        url 'http://dl.bintray.com/mattyb149/maven/'
    }
```

Is that how it's supposed to work?
fatJar) I've actually saw that with Gradle you can easily do somethinglike this
```
shadowJar {
   dependencies {
      exclude(dependency('org.codehaus.groovy:.*'))
      exclude(dependency('commons-.*:.*'))
   }
}
```
That way the fat jar will be much smaller but still executable byNiFi. Without that a 15kb jar ends up being a 8mb fat jar.
on-the-fly-reload) I'd rather hack the API that doing that :) Arethere any pointers/examples for this `InvokeScriptedProcessor`? Itseems all rather new and esoteric by looking at its docs.
Cheers,

Giovanni

On 4 Oct 2017, at 18:33, Matt Burgess wrote:
Giovanni,

I second all of Andy's answers, they are spot-on. For the each()
construct, they are "safe" in the sense that you will be working with
one flow file at a time, but remember that there is only one
"session". If you throw an Exception from inside the each(), then it
will be caught by ExecuteScript (if not caught by your script), and
the entire session will be rolled back. You are probably better off
with the approach you outlined where you wrap the logic in the
try/catch and route to success/failure accordingly... unless an error
indicates a "retry all", then a rollback is likely what you want.

For the ScriptTester, I haven't yet added support for setting
attributes on incoming flow file(s), I am trying to think of a clean
way to allow them for arbitrary flow files such as when the --input
switch is specified. Suggestions are welcome :) For the firstgo-round
I might allow something such that attributes would be added to all
flow files, or at least for one coming in via STDIN.

For the single fat/shaded JAR, you can certainly do things that way,
but if you are using Groovy, Clojure, or Javascript/Nashorn, you can
put all the JARs in a single directory (not nested!) and just add the
directory to your Module Directory property. That might save you a
build/package step. Doesn't help with reloading though.

For the on-the-fly reload of an updated fat JAR, you could (at the
expense of performance) have the script load the JAR. At that point
you'd probably be better served with InvokeScriptedProcessor so you
could add a FileWatcher at startup, and reload the JAR from aseparate
thread when changes are detected. In either case I believe you'd be
looking at creating a URLClassLoader with your fat JAR as the only
URL, and the current ClassLoader as its parent. Then you can set the
Thread's context classloader to the new one, and/or you may need todo
some more classloading voodoo.

Not sure if I covered all your questions/comments, but if not please
let me know and I will try again :)

Regards,
Matt


On Wed, Oct 4, 2017 at 3:18 AM, Giovanni Lanzani
<[email protected]> wrote:
Hi Andy,
That's very helpful, thanks! Inline my comments, waiting for Matt tocome
home :)

On 3 Oct 2017, at 22:44, Andy LoPresto wrote:

Giovanni,
A lot of great questions here. I’ll try to go through them but Ihope Matt
weighs in as well (he is on vacation for the next few days though).
* The only time I am aware the Jars are reloaded is at processorrestart (Ibelieve this is the same for the script content if defined by areferencedfile as well). The scriptingComponentHelper setup*() methods executeinside
ExecuteScript#setup(), which has @OnScheduled annotation [1].
Is there anyone that has written sort of script (I don't know if itis
possible) to query the NiFi API for all the (Groovy ExecuteScript)
processors using a particular module directory (we plan to use asingle one
for everything), so that I could add a new step, after the shadowJar
deployment, that restarts all of them?
I imagine this would be a fairly common use case. We're I'mcurrently
working we have the following workflow:
Have a single jar with all the code that the groovy scripts willneed;The groovy scripts will use that code with minimal boilerplatearound it, soall the (non-NiFi) related code is in the jar. This makes it veryeasy totest the logic in the jar. We added some extra code to ensure thefunctionsthat the groovy scripts will call are "NiFi compatible" (right nowit's just.getBytes(StandardCharsets.UTF_8)) We don't use Matt frameworkbecause weneed incoming flowFile to have attributes, and I couldn't figure outhow to
do it :)
NiFi has a flow to fetch new master updates on the repo and compilethe(fat) jar as a result. However we would need to restart theExecuteScriptprocessors by hand and... no/no? :) A script would help greatly here(ifnobody has one, I will dig into the API to see what's possible. Imight just
parse the whole xml file if there's no way to do so via the API;
* I’m not sure how other users bundle their dependencies, butshadow Jarswould be fine for this use case, and Matt has referenced using themin his
script-tester article [2].
* Yes, while there are small idiosyncrasies with each languageflavor, theNiFi-related domain is fairly consistent. In this case, iteratingover anumber of flowfiles for processing in a single Groovy script isfine.
Session.get(int) [3] is delegated to ProcessSession and returns
List<FlowFile>, so you can use any of the Groovy collections methodsover
it.

So what happens in this case

def n = 0
session.get(N).each{ flowFile ->
if(n ==0) {
//do something
} else {
throw Exception
}
session.transfer(flowFile, REL_SUCCESS)
n += 1
}
Will the first flowFile be successfully transferred or will arollbackhappen? (Note: I usually wrap the logic in try/catch and then, basedon the
result, transfer the file to REL_SUCCESS/REL_FAILURE

Thanks again,

Giovanni
Hopefully this helps you and if Matt or anyone else sees a mistake,they
correct it and add their thoughts. Thanks.

[1]
https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#onscheduled
[2]
https://funnifi.blogspot.com/2016/06/testing-executescript-processor-scripts.html
<https://funnifi.blogspot.com/2016/06/testing-executescript-processor-scripts.html>
[3]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/StandardProcessSession.java#L1520
<https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/StandardProcessSession.java#L1520>



Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69

On Oct 3, 2017, at 1:09 PM, Giovanni Lanzani
<[email protected]> wrote:

I apologize if this is specified elsewhere, but I couldn't find it.
I was wondering when the jars, used by a particular Groovy script(in theExecuteScript processor), are reloaded. I.e. if one jar is updated,whenwill the script pick up the new version? I know that upon restartingtheprocessor, the updated jar is considered, but I was wondering inwhich other
occasions that happens;
Do people tend to use fat (shadow) jars for this sort of jarsreferenced bygroovy scripts? I don't think it makes sense to keep track of allthe
dependencies manually otherwise;
When using the {P,J}ython processor, I read Matt advice to use thefollowing
construct in the script:
for flowFile in session.get(N):
if flowFile:
# do your thing here
Does the same hold for Groovy, i.e. should someone do

session.get(N).each{ flowFile ->
// do your thing here
if(condition) {
session.transfer(flowFile, REL_SUCCESS)
} else {
session.transfer(flowFile, REL_FAILURE)}

}
Is this approach safe in groovy inside a each? Or is this approachnot
needed at all in Groovy, while it is needed in {P,J}ython?

Thanks in advance!

Giovanni

Re: A bag of groovy questions regarding the ExecuteScript processor

Reply via email to