Hi Matt,

Thanks for your detailed response. So how do you recommend I proceed for
creating a good stable PutRethinkDB processor? Do I really have to entirely
write it and then package it to a NAR? I can that can bring some batching
optimizations too...

Thanks,
Stephane

On Mon, Jul 11, 2016 at 10:46 AM Matt Burgess <[email protected]> wrote:

> Stéphane,
>
> In 0.7.0 and forward, you will be able to set the number of concurrent
> tasks for ExecuteScript to whatever you like [1].  For
> InvokeScriptedProcessor, a current issue is that it only expects (and
> interacts with) a Processor interface, which includes an "initialize"
> method but doesn't check for an @OnScheduled annotation. Also, the
> initialize() method of Processor gets called when
> InvokeScriptedProcessor is scheduled and notices the script needs to
> be reloaded, which is when the Script File/Body, Engine, or Module
> Directory properties are modified (via the UI or REST or whatever). So
> theoretically the scripted processor's initialize() method is called
> when scheduled (as if it were an @OnScheduled), but only if something
> has changed. This could definitely be an improvement Jira where
> scripted processors can have their own annotated methods (especially
> @OnStopped since there is no -- even indirect -- call to something to
> stop the scripted processor). However this would only work for Jython
> [2], JRuby [3],  Groovy (and any other included JSR-223 language that
> supports Java annotations). I've written this up as [4].
>
>
> Regards,
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-1822
> [2] http://www.fiber-space.de/jynx/doc/jannotations.html
> [3] https://github.com/jruby/jruby/wiki/JRuby-Reference#java_annotation
> [4] https://issues.apache.org/jira/browse/NIFI-2215
>
> On Sun, Jul 10, 2016 at 7:42 PM, Stéphane Maarek
> <[email protected]> wrote:
> > Hi,
> >
> > I've been thinking about implementing a RethinkDB processor as I'm
> needing
> > one for my project. Right now, if I put my code inside of an
> ExecuteScript,
> > I basically connect to the database as many times as I'm inserting
> > documents, and that's rather inefficient (I believe). The best I can get
> is
> > to insert 90 documents a second. Also, it seems that I can't increase the
> > number of concurrent tasks on this processor.
> >
> > Here's my test code for reference (python):
> > import rethinkdb as r
> > r.connect('<myhost>', 28015).repl()
> > r.table('tv_shows').insert({ 'name': 'Star Trek TNG'
> > }).run(durability="soft", noreply=True)
> > flowFile = session.get()
> > session.transfer(flowFile, REL_SUCCESS)
> >
> > I have been thinking of doing some kind of implementation that's similar
> to
> > PutMongo. I see there is a @OnScheduled annotation that connects to the
> > database. Is this piece of code run every time a flowfile arrives, or is
> it
> > more "smartly" run? Also, can I, instead of going the long way and
> building
> > a NAR, use InvokeScriptedProcessor, alongside the @OnScheduled
> annotation?
> >
> > Finally, I seem to be quickly having some PermGen space issues. Is that
> > expected?
> >
> > Thanks,
> > Stephane
>

Reply via email to