I’m having a very similar problem. The process picks up the file, a custom
processor does it’s thing but no data is sent out.
> On Sep 24, 2015, at 5:56 PM, Adam Williams <[email protected]> wrote:
>
> For JsonSplit i am using just "$" to try and get the array into individual
> objects. It worked on a small subset, but a large seems to just hang.
>
> From: [email protected]
> Date: Thu, 24 Sep 2015 18:54:06 -0400
> Subject: Re: Array into MongoDB
> To: [email protected]
>
> Bryan is correct about the backing library reading everything into memory to
> do the evaluation.
>
> Might I ask what the expression you are using?
>
> On Thu, Sep 24, 2015 at 6:44 PM, Adam Williams <[email protected]
> <mailto:[email protected]>> wrote:
> I tried it even with 6GB and no luck. It's receiving the flowfiles, but
> nothing is happening after. If i do it with a small subset (3 JSON objects)
> it works perfect. When i throw the 180MB file it just spins, no logging,
> errors etc very odd. Any thoughts?
>
> Thanks
>
> From: [email protected] <mailto:[email protected]>
> To: [email protected] <mailto:[email protected]>
> Subject: RE: Array into MongoDB
> Date: Thu, 24 Sep 2015 21:23:35 +0000
>
>
> Bryan,
>
> I think that is whats happening, fans spinning like crazy, this is my current
> bootstrap.conf. I will bump it up, are there any other settings i should
> bump too?
>
> java.arg.2=-Xms512m
> java.arg.3=-Xmx2048m
>
> Thanks
>
> Date: Thu, 24 Sep 2015 17:20:27 -0400
> Subject: Re: Array into MongoDB
> From: [email protected] <mailto:[email protected]>
> To: [email protected] <mailto:[email protected]>
>
> One other thing I thought of... I think the JSON processors read the entire
> FlowFile content into memory to do the splitting/evaluating, so I wonder if
> you are running into a memory issue with a 180MB JSON file.
>
> Are you running with the default configuration of 512mb set in
> conf/bootstrap.conf ? If so it would be interesting to see what happens if
> you bump that up.
>
> On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bende <[email protected]
> <mailto:[email protected]>> wrote:
> Adam,
>
> Based on that message I suspect that MongoDB does not support sending in an
> array of documents since it looks like it expect the first character to be
> the start of a document and not an array.
>
> With regards to the SplitJson processor, if you set the JSON Path to $ then
> it should split at the top-level and send out each of your two documents on
> the splits relationship.
>
> -Bryan
>
>
> On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams <[email protected]
> <mailto:[email protected]>> wrote:
> I have an array of JSON object I am trying to put into Mongo, but I keep
> hitting this on the PutMongo processor:
>
> ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo
> PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert
> StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default,
> section=1], offset=0,
> length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due to
> org.bson.BsonInvalidOperationException: readStartDocument can only be called
> when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.:
> org.bson.BsonInvalidOperationException: readStartDocument can only be called
> when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.
>
>
> I tried to use the splitJson processor to split the array into segments, but
> to my experience I can't pull out each Json Obect. The splitjson processor
> just hangs and never produces logs or any output at all. The structure of my
> data is:
>
> [{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
>
> The JSON file itself is pretty large (>100mb).
>
> Thank you