The very first custom processor I wrote was called ExecutePythonScript, right after working through the NiFi Rocks! JSON example. Then, I began to grok the scope of free, existing, standard processors, among them /ExecuteProcess/ and /ExecuteStreamCommand/. Those did pretty much exactly what I was trying to do and I threw my code away when I found that this base was already covered. Months later, this Jython/ExecuteScript investigation exercise was my revisiting the issue for my users.

What my users employ out of, then back into NiFi is a proprietary filesystem in-box. It's a sophisticated mechanism that amounts to a very tight and sharing-safe version of GetFile and PutFile with decent metadata support (hey, wow, sort of like the NiFi flowfile). We've used it for many years. When we adopted NiFi just over a year ago, we interfaced it with custom processors /PutInbox/ and /GetInb//ox/. Once the flowfile goes to the filesystem, our folk are able to run their Python scripts on them. Thence, they return the result to a different in-box from which our NiFi ETL brings it back in as a flowfile. These guys have a hammer and everything else looks like a nail to them (they're no different than most of the rest of us in that respect).

I don't personally like this because it's not a clean flow. It suffers from not being very turn-key, requires a high degree of understanding on the part of those that use this hybrid approach and makes it impossible to think of, measure, and plan for our NiFi ETL process on many levels (like provenance, performance, SLA, etc.). We constantly have to translate attributes into in-box metadata, then translate back.

I have no opinion about whether a better Python (or other language) script is a good thing for NiFi generally, I just know that if /ExecuteScript/ were a no-brainer to use, my guys would use it and it would keep them away from these little black-hole, /Interstellar/ games in NiFi where the flowfiles leave, then reemerge.

I only say this in response to Matt's response to us. My original contribution to this thread was only to provide our experience in case it was somehow relevant. As I've said, I look to providing everything we need here to stay in NiFi from beginning to end of our ETL process to gain its _visibility_ and _unity_. Originally, I was promoting using some kind of AMQP to solve our ETL needs. One day, we stumbled upon NiFi. Since then, I've fallen in love with everything it does better than a home-brewed queue-messaging approach tying together inevitably disparate ETL applications. I especially love the UI, AbstractProcessorand the unit-test framework. It's pretty much a dream come true for me.

Russ


On 05/05/2017 05:24 PM, Matt Burgess wrote:
Russell et al,

I'd like to mention a couple of things here re: the context of the
scripting processors:

1) The scripting processors were designed to use the JSR-223 spec for
interfacing with a scripting language, in order to make adding new
languages easier (ideally you just add the dependency to the scripting
NAR). However the drawback is that the desired language must implement
this spec, and Python does not but Jython does.

2) NiFi is a Java application, Python is a native one. Either you'd
have to bring your own Python (and configure the processor with the
location of it, and then you're pretty close to just using
ExecuteStreamCommand), or we'd have to package a version for each OS
(if we are allowed to), and still have to shell out to it from Java to
the OS (again, that's what ExecuteStreamCommand does).  There is some
experimental work with JyNI to bridge the gap, but it's not ready for
primetime.

Having said that, there's a PR out there for a Groovy-specific
scripting processor (ExecuteGroovyScript), which isn't limited by the
JSR-223 spec and can make full use of all the Groovy goodness. That's
made easier because Groovy is a JVM scripting language.  However for
Python we could make the user experience better by having an
ExecutePythonScript processor; perhaps under the hood it is just a
glorified ExecuteStreamCommand that lets you put the script into a
processor property like ExecuteScript, but still shells out to a
python on the OS.  What do you think?

Regards,
Matt


On Fri, May 5, 2017 at 7:12 PM, Joe Witt <[email protected]> wrote:
It is worth discussing whether there is sufficient value to warrant
keeping jython/python support in the processors or whether we should
pull it.  It is certainly something we can document as being highly
limited but we don't really know how limited.  Frankly given the
performance I've seen with it I'd be ok removing it entirely.  One is
better off calling the script via a system call.  Groovy is one that
I've seen perform very well and be fully featured.

On Fri, May 5, 2017 at 6:38 PM, Russell Bateman <[email protected]> wrote:
We really want to use ExecuteScript because our end users are Pythonistas.
They tend to punctuate their flows with the equivalent of PutFile and
GetFile with Python scripts doing stuff on flowfiles that pass out of NiFi
before returning into NiFi.

However, we find it nearly impossible to replace even the tamest of
undertakings. If there were a good set of NiFi/Python shims that, from
PyCharm, etc., gave us the ability to prototype, test and debug before
copying and pasting into ExecuteScript, that would be wonderful. It hasn't
worked out that way. Most of our experience is copying, pasting into the
processor property, only to find something wrong, sometimes syntax,
sometimes something runtime.

On their behalf, I played with this processor a few hours a while back.
Another colleague too. Googling this underused tool hasn't been helpful, so
the overall experience is negative so far. I can get most of the examples
out there to work, but as soon as I try to do "real" work from my point of
view, my plans sort of cave in.

Likely the Groovy and/or Ruby options are better? But, we're not Groovy or
Ruby guys here. I understand the problems with this tool and so I understand
what the obstacles are to it growing stronger. The problems won't yield to a
few hours one Saturday afternoon. Better problem-logging underneath and
better- and more lenient Python support on top. The second one is tough,
though.

My approach is to minimize those black holes these guys put into their flows
by creating custom processors for what I can't solve using standard
processors.

Trying not to be too negative here...


On 05/05/2017 04:09 PM, Andre wrote:

Mike,

I believe it is possible to use requests under jython, however the process
isn't very intuitive.

I know one folk that if I recall correctly has used it. Happy to try to find
out how it is done.

Cheers

On Sat, May 6, 2017 at 4:57 AM, Mike Harding <[email protected]> wrote:
Hi All, I'm now looking at using ExecuteScript and python engine to
execute HTTP requests using the requests module. I've tried referencing
requests the module but when I try to import requests I get a module
reference error.
I downloaded the module from here > https://pypi.python.org/pypi/requests
Not sure why it isnt picking it up. Ive tried referencing the directory
and the .py directly with no success.
Any ideas where im going wrong?
Cheers,
Mike

Reply via email to