Jeff,

This seems to be a bit different as the processor is showing data as having
been written and there is a listing of one FlowFile of 381 MB being
transferred out from the processor.  Could you provide additional
information as to how data is not being sent out in the manner
anticipated?  If you can track the issue down more, let us know.  May be
helpful to create another message to help us track the issues separately as
we work through them.

Thanks!

Adam,

Found a sizable JSON file to work against and have been doing some initial
exploration.  With the large files, it certainly is a nontrivial process.
At cursory inspection, a good portion of processing seems to be spent on
validation.  There are some ways to tweak the strictness of this with the
supporting library, but will have to dive in a bit more.



On Thu, Sep 24, 2015 at 8:14 PM, Jeff <[email protected]> wrote:

>
>
>
> I’m having a very similar problem.  The process picks up the file, a
> custom processor does it’s thing but no data is sent out.
>
> [image: unknown.gif]
>
>
>
> On Sep 24, 2015, at 5:56 PM, Adam Williams <[email protected]>
> wrote:
>
> For JsonSplit i am using just "$" to try and get the array into individual
> objects.  It worked on a small subset, but a large seems to just hang.
>
> ------------------------------
> From: [email protected]
> Date: Thu, 24 Sep 2015 18:54:06 -0400
> Subject: Re: Array into MongoDB
> To: [email protected]
>
> Bryan is correct about the backing library reading everything into memory
> to do the evaluation.
>
> Might I ask what the expression you are using?
>
> On Thu, Sep 24, 2015 at 6:44 PM, Adam Williams <[email protected]
> > wrote:
>
> I tried it even with 6GB and no luck.  It's receiving the flowfiles, but
> nothing is happening after.  If i do it with a small subset (3 JSON
> objects) it works perfect.  When i throw the 180MB file it just spins, no
> logging, errors etc very odd.  Any thoughts?
>
> Thanks
>
> ------------------------------
> From: [email protected]
> To: [email protected]
> Subject: RE: Array into MongoDB
> Date: Thu, 24 Sep 2015 21:23:35 +0000
>
>
> Bryan,
>
> I think that is whats happening, fans spinning like crazy, this is my
> current bootstrap.conf.  I will bump it up, are there any other settings i
> should bump too?
>
> java.arg.2=-Xms512m
> java.arg.3=-Xmx2048m
>
> Thanks
>
> ------------------------------
> Date: Thu, 24 Sep 2015 17:20:27 -0400
> Subject: Re: Array into MongoDB
> From: [email protected]
> To: [email protected]
>
> One other thing I thought of... I think the JSON processors read the
> entire FlowFile content into memory to do the splitting/evaluating, so I
> wonder if you are running into a memory issue with a 180MB JSON file.
>
> Are you running with the default configuration of 512mb set in
> conf/bootstrap.conf ?  If so it would be interesting to see what happens if
> you bump that up.
>
> On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bende <[email protected]> wrote:
>
> Adam,
>
> Based on that message I suspect that MongoDB does not support sending in
> an array of documents since it looks like it expect the first character to
> be the start of a document and not an array.
>
> With regards to the SplitJson processor, if you set the JSON Path to $
> then it should split at the top-level and send out each of your two
> documents on the splits relationship.
>
> -Bryan
>
>
> On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams <[email protected]
> > wrote:
>
> I have an array of JSON object I am trying to put into Mongo, but I keep
> hitting this on the PutMongo processor:
>
> ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo
> PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert
> StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default,
> section=1], offset=0,
> length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due
> to org.bson.BsonInvalidOperationException: readStartDocument can only be
> called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is
> ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only
> be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is
> ARRAY.
>
>
> I tried to use the splitJson processor to split the array into segments,
> but to my experience I can't pull out each Json Obect.  The splitjson
> processor just hangs and never produces logs or any output at all.  The
> structure of my data is:
>
> [{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
>
> The JSON file itself is pretty large (>100mb).
>
> Thank you
>
>
>

Reply via email to