I have a case of running the 3rd party CLI (linux) with the following
- Should be executed upon a FlowFile with attributes/content containing
parameters to CLI
- Accepts params via flags or environment variables
- Writes output to stdout as a stream of JSON objects
- The output might be huge (millions and millions of objects), which means
caching stdout is not an option - each line/object should be sent as a separate
- The errors/log is written to stderr (might be very chatty)
Using ExecuteProcessor is not an option (cannot be trigger by incoming
FlowFile), but the way it treats stdout is what is desired.
Using ExecuteStreamCommand is not an option as it buffers the output until the
binary exists with a status code 0.
Does anybody know if there’s a hybrid component somewhere out there? ;-)
Thank you in advance!
P.S. I’ve tried to write a wrapping script in Python using ExecuteScript
- it looks rather an overkill (JVM -> Jython -> Python -> System process -> …)
- scripting for NiFi is not providing a pleasant debugging experience
- I get weird random errors when moving flow from machine to machine - exact
copies of VMs (like the example below).
> Caused by: javax.script.ScriptException: AttributeError: type object
> 'java.lang.Thread' has no attribute 'State' in <script> at line number 1
> at org.python.jsr223.PyScriptEngine.eval(PyScriptEngine.java:59)
> at org.python.jsr223.PyScriptEngine.eval(PyScriptEngine.java:31)