Russ, I'd like to help make ExecuteScript as much of a no-brainer as possible. I agree that it is difficult to develop scripts inside the ExecuteScript "environment" due to a lack of debugging and such. To that end I wrote a "NiFi Script Tester" [1], which lets you provide a script file and has a few options for specifying input and output (blog post here [2]).
The original version only supported Groovy and Javascript (the latter of which is built into the JRE), I wanted to keep the file size down and figured those were the easiest and most popular options. However a number of people are using Jython in ExecuteScript, so I added Jython support and just released the 1.1.2 version of the NiFi Script Tester [3]. Besides the command-line utility, you could use it as a dependency in a small project for prototyping and/or debugging. I hope this is more helpful in terms of prototyping scripts outside the flow. I'm always open to questions and suggestions on how to make it better. Another (more heavyweight) approach is to extend an existing unit test from the code, adding your script and whatever testing assertions, etc. This (in an IDE) would allow you to debug into the ExecuteScript processor (although probably not into the script evaluation itself) to see what's going on. As an improvement to the scripting processors, reporting tasks, etc. I would like to add a custom Advanced UI such as the JoltTransformJson processor has, which could give context-sensitive highlighting for whichever language you are scripting in, and perhaps test/preview the script using a similar approach as I have in my NiFi Script Tester. I just don't have the Angular chops to take on the task alone, will be reaching out to folks with strong UI-fu. What other improvements / features would make the scripting stuff easier to use? I would like to discuss and get workin' on them :) Regards, Matt [1] https://github.com/mattyb149/nifi-script-tester/ [2] http://funnifi.blogspot.com/2016/06/testing-executescript-processor-scripts.html [3] https://bintray.com/mattyb149/maven/download_file?file_path=mattyb149%2Fnifi-script-tester%2F1.1.2%2Fnifi-script-tester-1.1.2-all.jar On Sat, May 6, 2017 at 10:13 AM, Russell Bateman <[email protected]> wrote: > The very first custom processor I wrote was called ExecutePythonScript, > right after working through the NiFi Rocks! JSON example. Then, I began to > grok the scope of free, existing, standard processors, among them > ExecuteProcess and ExecuteStreamCommand. Those did pretty much exactly what > I was trying to do and I threw my code away when I found that this base was > already covered. Months later, this Jython/ExecuteScript investigation > exercise was my revisiting the issue for my users. > > What my users employ out of, then back into NiFi is a proprietary filesystem > in-box. It's a sophisticated mechanism that amounts to a very tight and > sharing-safe version of GetFile and PutFile with decent metadata support > (hey, wow, sort of like the NiFi flowfile). We've used it for many years. > When we adopted NiFi just over a year ago, we interfaced it with custom > processors PutInbox and GetInbox. Once the flowfile goes to the filesystem, > our folk are able to run their Python scripts on them. Thence, they return > the result to a different in-box from which our NiFi ETL brings it back in > as a flowfile. These guys have a hammer and everything else looks like a > nail to them (they're no different than most of the rest of us in that > respect). > > I don't personally like this because it's not a clean flow. It suffers from > not being very turn-key, requires a high degree of understanding on the part > of those that use this hybrid approach and makes it impossible to think of, > measure, and plan for our NiFi ETL process on many levels (like provenance, > performance, SLA, etc.). We constantly have to translate attributes into > in-box metadata, then translate back. > > I have no opinion about whether a better Python (or other language) script > is a good thing for NiFi generally, I just know that if ExecuteScript were a > no-brainer to use, my guys would use it and it would keep them away from > these little black-hole, Interstellar games in NiFi where the flowfiles > leave, then reemerge. > > I only say this in response to Matt's response to us. My original > contribution to this thread was only to provide our experience in case it > was somehow relevant. As I've said, I look to providing everything we need > here to stay in NiFi from beginning to end of our ETL process to gain its > visibility and unity. Originally, I was promoting using some kind of AMQP to > solve our ETL needs. One day, we stumbled upon NiFi. Since then, I've fallen > in love with everything it does better than a home-brewed queue-messaging > approach tying together inevitably disparate ETL applications. I especially > love the UI, AbstractProcessor and the unit-test framework. It's pretty much > a dream come true for me. > > Russ > > > On 05/05/2017 05:24 PM, Matt Burgess wrote: > > Russell et al, > > I'd like to mention a couple of things here re: the context of the > scripting processors: > > 1) The scripting processors were designed to use the JSR-223 spec for > interfacing with a scripting language, in order to make adding new > languages easier (ideally you just add the dependency to the scripting > NAR). However the drawback is that the desired language must implement > this spec, and Python does not but Jython does. > > 2) NiFi is a Java application, Python is a native one. Either you'd > have to bring your own Python (and configure the processor with the > location of it, and then you're pretty close to just using > ExecuteStreamCommand), or we'd have to package a version for each OS > (if we are allowed to), and still have to shell out to it from Java to > the OS (again, that's what ExecuteStreamCommand does). There is some > experimental work with JyNI to bridge the gap, but it's not ready for > primetime. > > Having said that, there's a PR out there for a Groovy-specific > scripting processor (ExecuteGroovyScript), which isn't limited by the > JSR-223 spec and can make full use of all the Groovy goodness. That's > made easier because Groovy is a JVM scripting language. However for > Python we could make the user experience better by having an > ExecutePythonScript processor; perhaps under the hood it is just a > glorified ExecuteStreamCommand that lets you put the script into a > processor property like ExecuteScript, but still shells out to a > python on the OS. What do you think? > > Regards, > Matt > > > On Fri, May 5, 2017 at 7:12 PM, Joe Witt <[email protected]> wrote: > > It is worth discussing whether there is sufficient value to warrant > keeping jython/python support in the processors or whether we should > pull it. It is certainly something we can document as being highly > limited but we don't really know how limited. Frankly given the > performance I've seen with it I'd be ok removing it entirely. One is > better off calling the script via a system call. Groovy is one that > I've seen perform very well and be fully featured. > > On Fri, May 5, 2017 at 6:38 PM, Russell Bateman <[email protected]> > wrote: > > We really want to use ExecuteScript because our end users are Pythonistas. > They tend to punctuate their flows with the equivalent of PutFile and > GetFile with Python scripts doing stuff on flowfiles that pass out of NiFi > before returning into NiFi. > > However, we find it nearly impossible to replace even the tamest of > undertakings. If there were a good set of NiFi/Python shims that, from > PyCharm, etc., gave us the ability to prototype, test and debug before > copying and pasting into ExecuteScript, that would be wonderful. It hasn't > worked out that way. Most of our experience is copying, pasting into the > processor property, only to find something wrong, sometimes syntax, > sometimes something runtime. > > On their behalf, I played with this processor a few hours a while back. > Another colleague too. Googling this underused tool hasn't been helpful, so > the overall experience is negative so far. I can get most of the examples > out there to work, but as soon as I try to do "real" work from my point of > view, my plans sort of cave in. > > Likely the Groovy and/or Ruby options are better? But, we're not Groovy or > Ruby guys here. I understand the problems with this tool and so I understand > what the obstacles are to it growing stronger. The problems won't yield to a > few hours one Saturday afternoon. Better problem-logging underneath and > better- and more lenient Python support on top. The second one is tough, > though. > > My approach is to minimize those black holes these guys put into their flows > by creating custom processors for what I can't solve using standard > processors. > > Trying not to be too negative here... > > > On 05/05/2017 04:09 PM, Andre wrote: > > Mike, > > I believe it is possible to use requests under jython, however the process > isn't very intuitive. > > I know one folk that if I recall correctly has used it. Happy to try to find > out how it is done. > > Cheers > > On Sat, May 6, 2017 at 4:57 AM, Mike Harding <[email protected]> wrote: > > Hi All, I'm now looking at using ExecuteScript and python engine to > execute HTTP requests using the requests module. I've tried referencing > requests the module but when I try to import requests I get a module > reference error. > I downloaded the module from here > https://pypi.python.org/pypi/requests > Not sure why it isnt picking it up. Ive tried referencing the directory > and the .py directly with no success. > Any ideas where im going wrong? > Cheers, > Mike > >
