Hi Andy,
That's very helpful, thanks! Inline my comments, waiting for Matt to
come
home :)
On 3 Oct 2017, at 22:44, Andy LoPresto wrote:
Giovanni,
A lot of great questions here. I’ll try to go through them but I
hope Matt
weighs in as well (he is on vacation for the next few days though).
* The only time I am aware the Jars are reloaded is at processor
restart (I
believe this is the same for the script content if defined by a
referenced
file as well). The scriptingComponentHelper setup*() methods execute
inside
ExecuteScript#setup(), which has @OnScheduled annotation [1].
Is there anyone that has written sort of script (I don't know if it
is
possible) to query the NiFi API for all the (Groovy ExecuteScript)
processors using a particular module directory (we plan to use a
single one
for everything), so that I could add a new step, after the shadowJar
deployment, that restarts all of them?
I imagine this would be a fairly common use case. We're I'm currently
working we have the following workflow:
Have a single jar with all the code that the groovy scripts will
need;
The groovy scripts will use that code with minimal boilerplate around
it, so
all the (non-NiFi) related code is in the jar. This makes it very
easy to
test the logic in the jar. We added some extra code to ensure the
functions
that the groovy scripts will call are "NiFi compatible" (right now
it's just
.getBytes(StandardCharsets.UTF_8)) We don't use Matt framework
because we
need incoming flowFile to have attributes, and I couldn't figure out
how to
do it :)
NiFi has a flow to fetch new master updates on the repo and compile
the
(fat) jar as a result. However we would need to restart the
ExecuteScript
processors by hand and... no/no? :) A script would help greatly here
(if
nobody has one, I will dig into the API to see what's possible. I
might just
parse the whole xml file if there's no way to do so via the API;
* I’m not sure how other users bundle their dependencies, but
shadow Jars
would be fine for this use case, and Matt has referenced using them
in his
script-tester article [2].
* Yes, while there are small idiosyncrasies with each language
flavor, the
NiFi-related domain is fairly consistent. In this case, iterating
over a
number of flowfiles for processing in a single Groovy script is fine.
Session.get(int) [3] is delegated to ProcessSession and returns
List<FlowFile>, so you can use any of the Groovy collections methods
over
it.
So what happens in this case
def n = 0
session.get(N).each{ flowFile ->
if(n ==0) {
//do something
} else {
throw Exception
}
session.transfer(flowFile, REL_SUCCESS)
n += 1
}
Will the first flowFile be successfully transferred or will a
rollback
happen? (Note: I usually wrap the logic in try/catch and then, based
on the
result, transfer the file to REL_SUCCESS/REL_FAILURE
Thanks again,
Giovanni
Hopefully this helps you and if Matt or anyone else sees a mistake,
they
correct it and add their thoughts. Thanks.
[1]
https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#onscheduled
[2]
https://funnifi.blogspot.com/2016/06/testing-executescript-processor-scripts.html
<https://funnifi.blogspot.com/2016/06/testing-executescript-processor-scripts.html>
[3]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/StandardProcessSession.java#L1520
<https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/StandardProcessSession.java#L1520>
Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
On Oct 3, 2017, at 1:09 PM, Giovanni Lanzani
<[email protected]> wrote:
I apologize if this is specified elsewhere, but I couldn't find it.
I was wondering when the jars, used by a particular Groovy script (in
the
ExecuteScript processor), are reloaded. I.e. if one jar is updated,
when
will the script pick up the new version? I know that upon restarting
the
processor, the updated jar is considered, but I was wondering in
which other
occasions that happens;
Do people tend to use fat (shadow) jars for this sort of jars
referenced by
groovy scripts? I don't think it makes sense to keep track of all the
dependencies manually otherwise;
When using the {P,J}ython processor, I read Matt advice to use the
following
construct in the script:
for flowFile in session.get(N):
if flowFile:
# do your thing here
Does the same hold for Groovy, i.e. should someone do
session.get(N).each{ flowFile ->
// do your thing here
if(condition) {
session.transfer(flowFile, REL_SUCCESS)
} else {
session.transfer(flowFile, REL_FAILURE)}
}
Is this approach safe in groovy inside a each? Or is this approach
not
needed at all in Groovy, while it is needed in {P,J}ython?
Thanks in advance!
Giovanni