Hi Boris, Matt, Thank you for the prompt answers and suggestions.
@Matt, right, this capability would be great to have. I will proceed with submission of the improvement request. @Boris, the blog article is helpful - I’ve run into similar frustration cycles with hanging processor and manual killing of the processes. I’m going to give Groovy a try now - looks like ProcessBuilder with line-by-line reads from a process is the way to go in my case. Once we touched this topic, here’s another variation of process execution I had to deal with: - a FlowFile triggers execution of a CLI (parametrization via flags or environment) - subsequent FlowFiles’ content passed to CLI’s stdin - CLI’s producing output (line by line) to stdout (same Batch Duration expected) - once the flow of incoming data is over (might be indicated by a special FlowFile) the stdin is closed and that signals the CLI to exist properly So far, I implemented it using ExecuteStreamCommand with preliminary FlowFiles content aggregation (via MergeContent) to keep small input side and then split of the stdout (as a single FlowFile) into separate ones. Not the optional way to go and took some time to guess the merge size in order not to produce huge output (as it’s buffered). This sounds like a completely different type of a processor for processing a group (in terms of FBP)… Kind regards, Oleksandr > On 7. Mar 2018, at 19:32, Matt Burgess <[email protected]> wrote: > > Alexander, > > It sounds like you'd like to see the Batch Duration capability from > ExecuteProcess added to ExecuteStreamCommand, please feel free to > write a Jira case [1] for this improvement. > > In the meantime, I second Boris's thought on using Groovy to launch > your script, it's much more integrated with the NiFi API (as they are > both organic to the JVM), plus the Jython error you're running into > has not been fixed yet [2]. > > Regards, > Matt > > [1] https://issues.apache.org/jira/browse/NIFI > [2] http://bugs.jython.org/issue2642 > > On Wed, Mar 7, 2018 at 12:34 PM, Oleksandr Lobunets > <[email protected]> wrote: >> Hello everyone, >> >> I have a case of running the 3rd party CLI (linux) with the following >> behaviour: >> - Should be executed upon a FlowFile with attributes/content containing >> parameters to CLI >> - Accepts params via flags or environment variables >> - Writes output to stdout as a stream of JSON objects >> - The output might be huge (millions and millions of objects), which means >> caching stdout is not an option - each line/object should be sent as a >> separate FlowFile >> - The errors/log is written to stderr (might be very chatty) >> >> Using ExecuteProcessor is not an option (cannot be trigger by incoming >> FlowFile), but the way it treats stdout is what is desired. >> Using ExecuteStreamCommand is not an option as it buffers the output until >> the binary exists with a status code 0. >> >> Does anybody know if there’s a hybrid component somewhere out there? ;-) >> >> Thank you in advance! >> >> P.S. I’ve tried to write a wrapping script in Python using ExecuteScript >> processor, but: >> - it looks rather an overkill (JVM -> Jython -> Python -> System process -> >> …) >> - scripting for NiFi is not providing a pleasant debugging experience >> - I get weird random errors when moving flow from machine to machine - exact >> copies of VMs (like the example below). >> >>> Caused by: javax.script.ScriptException: AttributeError: type object >>> 'java.lang.Thread' has no attribute 'State' in <script> at line number 1 >>> at >>> org.python.jsr223.PyScriptEngine.scriptException(PyScriptEngine.java:222) >>> at org.python.jsr223.PyScriptEngine.eval(PyScriptEngine.java:59) >>> at org.python.jsr223.PyScriptEngine.eval(PyScriptEngine.java:31) >>> at >>> javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:264) >>> at >>> org.apache.nifi.script.impl.JythonScriptEngineConfigurator.eval(JythonScriptEngineConfigurator.java:59) >>> at >>> org.apache.nifi.processors.script.ExecuteScript.onTrigger(ExecuteScript.java:220) >> >> >> Kind regards, >> Alexander
