Array into MongoDB
I have an array of JSON object I am trying to put into Mongo, but I keep hitting this on the PutMongo processor: ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default, section=1], offset=0, length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due to org.bson.BsonInvalidOperationException: readStartDocument can only be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY. I tried to use the splitJson processor to split the array into segments, but to my experience I can't pull out each Json Obect. The splitjson processor just hangs and never produces logs or any output at all. The structure of my data is: [{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}] The JSON file itself is pretty large (>100mb). Thank you
Re: Array into MongoDB
Bryan is correct about the backing library reading everything into memory to do the evaluation. Might I ask what the expression you are using? On Thu, Sep 24, 2015 at 6:44 PM, Adam Williams <aaronfwilli...@outlook.com> wrote: > I tried it even with 6GB and no luck. It's receiving the flowfiles, but > nothing is happening after. If i do it with a small subset (3 JSON > objects) it works perfect. When i throw the 180MB file it just spins, no > logging, errors etc very odd. Any thoughts? > > Thanks > > -- > From: aaronfwilli...@outlook.com > To: users@nifi.apache.org > Subject: RE: Array into MongoDB > Date: Thu, 24 Sep 2015 21:23:35 + > > > Bryan, > > I think that is whats happening, fans spinning like crazy, this is my > current bootstrap.conf. I will bump it up, are there any other settings i > should bump too? > > java.arg.2=-Xms512m > > java.arg.3=-Xmx2048m > > Thanks > > ------ > Date: Thu, 24 Sep 2015 17:20:27 -0400 > Subject: Re: Array into MongoDB > From: bbe...@gmail.com > To: users@nifi.apache.org > > One other thing I thought of... I think the JSON processors read the > entire FlowFile content into memory to do the splitting/evaluating, so I > wonder if you are running into a memory issue with a 180MB JSON file. > > Are you running with the default configuration of 512mb set in > conf/bootstrap.conf ? If so it would be interesting to see what happens if > you bump that up. > > On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bende <bbe...@gmail.com> wrote: > > Adam, > > Based on that message I suspect that MongoDB does not support sending in > an array of documents since it looks like it expect the first character to > be the start of a document and not an array. > > With regards to the SplitJson processor, if you set the JSON Path to $ > then it should split at the top-level and send out each of your two > documents on the splits relationship. > > -Bryan > > > On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams <aaronfwilli...@outlook.com > > wrote: > > I have an array of JSON object I am trying to put into Mongo, but I keep > hitting this on the PutMongo processor: > > ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo > PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert > StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim > [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default, > section=1], offset=0, > length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due > to org.bson.BsonInvalidOperationException: readStartDocument can only be > called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is > ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only > be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is > ARRAY. > > > > I tried to use the splitJson processor to split the array into segments, > but to my experience I can't pull out each Json Obect. The splitjson > processor just hangs and never produces logs or any output at all. The > structure of my data is: > > > [{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}] > > > The JSON file itself is pretty large (>100mb). > > > Thank you > > > >
Re: Array into MongoDB
One other thing I thought of... I think the JSON processors read the entire FlowFile content into memory to do the splitting/evaluating, so I wonder if you are running into a memory issue with a 180MB JSON file. Are you running with the default configuration of 512mb set in conf/bootstrap.conf ? If so it would be interesting to see what happens if you bump that up. On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bendewrote: > Adam, > > Based on that message I suspect that MongoDB does not support sending in > an array of documents since it looks like it expect the first character to > be the start of a document and not an array. > > With regards to the SplitJson processor, if you set the JSON Path to $ > then it should split at the top-level and send out each of your two > documents on the splits relationship. > > -Bryan > > > On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams > wrote: > >> I have an array of JSON object I am trying to put into Mongo, but I keep >> hitting this on the PutMongo processor: >> >> ERROR [Timer-Driven Process Thread-1] >> o.a.nifi.processors.mongodb.PutMongo >> PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert >> StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim >> [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default, >> section=1], offset=0, >> length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due >> to org.bson.BsonInvalidOperationException: readStartDocument can only be >> called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is >> ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only >> be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is >> ARRAY. >> >> >> >> I tried to use the splitJson processor to split the array into segments, >> but to my experience I can't pull out each Json Obect. The splitjson >> processor just hangs and never produces logs or any output at all. The >> structure of my data is: >> >> >> [{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}] >> >> >> The JSON file itself is pretty large (>100mb). >> >> >> Thank you >> > >
Re: Array into MongoDB
I’m having a very similar problem. The process picks up the file, a custom processor does it’s thing but no data is sent out. > On Sep 24, 2015, at 5:56 PM, Adam Williams <aaronfwilli...@outlook.com> wrote: > > For JsonSplit i am using just "$" to try and get the array into individual > objects. It worked on a small subset, but a large seems to just hang. > > From: aldrinp...@gmail.com > Date: Thu, 24 Sep 2015 18:54:06 -0400 > Subject: Re: Array into MongoDB > To: users@nifi.apache.org > > Bryan is correct about the backing library reading everything into memory to > do the evaluation. > > Might I ask what the expression you are using? > > On Thu, Sep 24, 2015 at 6:44 PM, Adam Williams <aaronfwilli...@outlook.com > <mailto:aaronfwilli...@outlook.com>> wrote: > I tried it even with 6GB and no luck. It's receiving the flowfiles, but > nothing is happening after. If i do it with a small subset (3 JSON objects) > it works perfect. When i throw the 180MB file it just spins, no logging, > errors etc very odd. Any thoughts? > > Thanks > > From: aaronfwilli...@outlook.com <mailto:aaronfwilli...@outlook.com> > To: users@nifi.apache.org <mailto:users@nifi.apache.org> > Subject: RE: Array into MongoDB > Date: Thu, 24 Sep 2015 21:23:35 + > > > Bryan, > > I think that is whats happening, fans spinning like crazy, this is my current > bootstrap.conf. I will bump it up, are there any other settings i should > bump too? > > java.arg.2=-Xms512m > java.arg.3=-Xmx2048m > > Thanks > > Date: Thu, 24 Sep 2015 17:20:27 -0400 > Subject: Re: Array into MongoDB > From: bbe...@gmail.com <mailto:bbe...@gmail.com> > To: users@nifi.apache.org <mailto:users@nifi.apache.org> > > One other thing I thought of... I think the JSON processors read the entire > FlowFile content into memory to do the splitting/evaluating, so I wonder if > you are running into a memory issue with a 180MB JSON file. > > Are you running with the default configuration of 512mb set in > conf/bootstrap.conf ? If so it would be interesting to see what happens if > you bump that up. > > On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bende <bbe...@gmail.com > <mailto:bbe...@gmail.com>> wrote: > Adam, > > Based on that message I suspect that MongoDB does not support sending in an > array of documents since it looks like it expect the first character to be > the start of a document and not an array. > > With regards to the SplitJson processor, if you set the JSON Path to $ then > it should split at the top-level and send out each of your two documents on > the splits relationship. > > -Bryan > > > On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams <aaronfwilli...@outlook.com > <mailto:aaronfwilli...@outlook.com>> wrote: > I have an array of JSON object I am trying to put into Mongo, but I keep > hitting this on the PutMongo processor: > > ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo > PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert > StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim > [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default, > section=1], offset=0, > length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due to > org.bson.BsonInvalidOperationException: readStartDocument can only be called > when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.: > org.bson.BsonInvalidOperationException: readStartDocument can only be called > when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY. > > > I tried to use the splitJson processor to split the array into segments, but > to my experience I can't pull out each Json Obect. The splitjson processor > just hangs and never produces logs or any output at all. The structure of my > data is: > > [{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}] > > The JSON file itself is pretty large (>100mb). > > Thank you