Re: NiFi and Mongo

Joe Witt Thu, 21 Jun 2018 06:57:02 -0700

Kelsey

Have you looked at JSONTreeReader?


thanks
joe

On Thu, Jun 21, 2018, 5:00 AM Kelsey RIDER <[email protected]>
wrote:

> OK, thanks for the heads-up!
>
>
>
> If I could make another suggestion: could the JSONPathReader be made a
> little more dynamic? Currently you have to specify every single field…
>
>
>
> In my case (although I doubt I’m alone), I have several different
> collections with different schemas. My options are either to have one
> JSONPathReader with dozens of attributes, or else one Reader per collection
> type (but then I’d have to somehow dynamically choose which reader to use).
> It would be easier if there were a way to have a single expression
> (wildcards? Regex?) that could pick up several properties at once.
>
>
>
>
>
> *From:* Mike Thomsen <[email protected]>
> *Sent:* jeudi 21 juin 2018 13:06
> *To:* [email protected]
> *Subject:* Re: NiFi and Mongo
>
>
>
> Your general assessment about what you'd need is correct. It's a fairly
> easy component to build, and I'll throw up a Jira ticket for it. Would
> definitely be doable for NiFi 1.8.
>
>
>
> Expect the Mongo stuff to go through some real clean up like this in 1.8.
> One of the other big changes is I will be moving the processors to using a
> controller service as an optional configuration for the Mongo client with
> the plan that by probably 1.9 all of the Mongo processors will drop their
> own client configurations and use the same pool (currently every processor
> instance maintains its own).
>
>
>
> On Thu, Jun 21, 2018 at 3:13 AM Kelsey RIDER <
> [email protected]> wrote:
>
> Hello,
>
>
>
> I’ve been experimenting with NiFi and MongoDB. I have a test collection
> with 1 million documents in it. Each document has the same flat JSON
> structure with 11 fields.
>
> My NiFi flow exposes a webservice, which allows the user to fetch all the
> data in CSV format.
>
>
>
> However, 1M documents brings NiFi to its knees. Even after increasing the
> JVM’s Xms and Xmx to 2G, I still get an OutOfMemoryError:
>
>
>
> 2018-06-20 11:27:43,428 WARN [Timer-Driven Process Thread-7]
> o.a.n.controller.tasks.ConnectableTask Admng.OutOfMemoryError: Java heap
> space
>
> java.lang.OutOfMemoryError: Java heap space
>
>         at java.util.Arrays.copyOf(Arrays.java:3332)
>
>         at
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
>
>         at
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
>
>         at java.lang.StringBuilder.append(StringBuilder.java:136)
>
>         at
> org.apache.nifi.processors.mongodb.GetMongo.buildBatch(GetMongo.java:222)
>
>         at
> org.apache.nifi.processors.mongodb.GetMongo.onTrigger(GetMongo.java:341)
>
>         at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>
>         at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1147)
>
>         at
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:175)
>
>         at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenScheduling
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThr
>
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPool
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
>
>
> I dug into the code, and discovered that the GetMongo processor takes all
> the Documents returned from Mongo, converts them to Strings, and
> concatenates them in a StringBuilder.
>
>
>
> My question is thus: is there a better way that I should be doing this?
>
> The only idea I’ve had is to use a smaller batch size, but that would mean
> that I’d just need a later processor to concatenate the batches in order to
> get one big CSV.
>
> Is there some sort of “GetMongoRecord” processor that reads each mongo
> Document as a record, in the way ExecuteSQL does? (I’ve done the same test
> with an SQL database, and it handles 1M records just fine.)
>
>
>
> Thanks for your help,
>
>
>
> Kelsey
>
> *Suite à l’évolution des dispositifs de réglementation du travail, si vous
> recevez ce mail avant 7h00, en soirée, durant le week-end ou vos congés
> merci, sauf cas d’urgence exceptionnelle, de ne pas le traiter ni d’y
> répondre immédiatement.*
>
>

Re: NiFi and Mongo

Reply via email to