Stéphane, In 0.7.0 and forward, you will be able to set the number of concurrent tasks for ExecuteScript to whatever you like [1]. For InvokeScriptedProcessor, a current issue is that it only expects (and interacts with) a Processor interface, which includes an "initialize" method but doesn't check for an @OnScheduled annotation. Also, the initialize() method of Processor gets called when InvokeScriptedProcessor is scheduled and notices the script needs to be reloaded, which is when the Script File/Body, Engine, or Module Directory properties are modified (via the UI or REST or whatever). So theoretically the scripted processor's initialize() method is called when scheduled (as if it were an @OnScheduled), but only if something has changed. This could definitely be an improvement Jira where scripted processors can have their own annotated methods (especially @OnStopped since there is no -- even indirect -- call to something to stop the scripted processor). However this would only work for Jython [2], JRuby [3], Groovy (and any other included JSR-223 language that supports Java annotations). I've written this up as [4].
Regards, Matt [1] https://issues.apache.org/jira/browse/NIFI-1822 [2] http://www.fiber-space.de/jynx/doc/jannotations.html [3] https://github.com/jruby/jruby/wiki/JRuby-Reference#java_annotation [4] https://issues.apache.org/jira/browse/NIFI-2215 On Sun, Jul 10, 2016 at 7:42 PM, Stéphane Maarek <[email protected]> wrote: > Hi, > > I've been thinking about implementing a RethinkDB processor as I'm needing > one for my project. Right now, if I put my code inside of an ExecuteScript, > I basically connect to the database as many times as I'm inserting > documents, and that's rather inefficient (I believe). The best I can get is > to insert 90 documents a second. Also, it seems that I can't increase the > number of concurrent tasks on this processor. > > Here's my test code for reference (python): > import rethinkdb as r > r.connect('<myhost>', 28015).repl() > r.table('tv_shows').insert({ 'name': 'Star Trek TNG' > }).run(durability="soft", noreply=True) > flowFile = session.get() > session.transfer(flowFile, REL_SUCCESS) > > I have been thinking of doing some kind of implementation that's similar to > PutMongo. I see there is a @OnScheduled annotation that connects to the > database. Is this piece of code run every time a flowfile arrives, or is it > more "smartly" run? Also, can I, instead of going the long way and building > a NAR, use InvokeScriptedProcessor, alongside the @OnScheduled annotation? > > Finally, I seem to be quickly having some PermGen space issues. Is that > expected? > > Thanks, > Stephane
