Re: Create a PutRethinkDB processor

Matt Burgess Sun, 10 Jul 2016 17:47:51 -0700

Stéphane,

In 0.7.0 and forward, you will be able to set the number of concurrent
tasks for ExecuteScript to whatever you like [1].  For
InvokeScriptedProcessor, a current issue is that it only expects (and
interacts with) a Processor interface, which includes an "initialize"
method but doesn't check for an @OnScheduled annotation. Also, the
initialize() method of Processor gets called when
InvokeScriptedProcessor is scheduled and notices the script needs to
be reloaded, which is when the Script File/Body, Engine, or Module
Directory properties are modified (via the UI or REST or whatever). So
theoretically the scripted processor's initialize() method is called
when scheduled (as if it were an @OnScheduled), but only if something
has changed. This could definitely be an improvement Jira where
scripted processors can have their own annotated methods (especially
@OnStopped since there is no -- even indirect -- call to something to
stop the scripted processor). However this would only work for Jython
[2], JRuby [3],  Groovy (and any other included JSR-223 language that
supports Java annotations). I've written this up as [4].



Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-1822
[2] http://www.fiber-space.de/jynx/doc/jannotations.html
[3] https://github.com/jruby/jruby/wiki/JRuby-Reference#java_annotation
[4] https://issues.apache.org/jira/browse/NIFI-2215

On Sun, Jul 10, 2016 at 7:42 PM, Stéphane Maarek
<[email protected]> wrote:
> Hi,
>
> I've been thinking about implementing a RethinkDB processor as I'm needing
> one for my project. Right now, if I put my code inside of an ExecuteScript,
> I basically connect to the database as many times as I'm inserting
> documents, and that's rather inefficient (I believe). The best I can get is
> to insert 90 documents a second. Also, it seems that I can't increase the
> number of concurrent tasks on this processor.
>
> Here's my test code for reference (python):
> import rethinkdb as r
> r.connect('<myhost>', 28015).repl()
> r.table('tv_shows').insert({ 'name': 'Star Trek TNG'
> }).run(durability="soft", noreply=True)
> flowFile = session.get()
> session.transfer(flowFile, REL_SUCCESS)
>
> I have been thinking of doing some kind of implementation that's similar to
> PutMongo. I see there is a @OnScheduled annotation that connects to the
> database. Is this piece of code run every time a flowfile arrives, or is it
> more "smartly" run? Also, can I, instead of going the long way and building
> a NAR, use InvokeScriptedProcessor, alongside the @OnScheduled annotation?
>
> Finally, I seem to be quickly having some PermGen space issues. Is that
> expected?
>
> Thanks,
> Stephane

Re: Create a PutRethinkDB processor

Reply via email to