Array into MongoDB

2015-09-24 Thread Adam Williams
I have an array of JSON object I am trying to put into Mongo, but I keep 
hitting this on the PutMongo processor:
ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo 
PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert 
StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default, 
section=1], offset=0, length=208380820],offset=0,name=test.json,size=208380820] 
into MongoDB due to org.bson.BsonInvalidOperationException: readStartDocument 
can only be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType 
is ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only 
be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.

I tried to use the splitJson processor to split the array into segments, but to 
my experience I can't pull out each Json Obect.  The splitjson processor just 
hangs and never produces logs or any output at all.  The structure of my data 
is:
[{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
The JSON file itself is pretty large (>100mb).
Thank you 

Re: Array into MongoDB

2015-09-24 Thread Aldrin Piri
Bryan is correct about the backing library reading everything into memory
to do the evaluation.

Might I ask what the expression you are using?

On Thu, Sep 24, 2015 at 6:44 PM, Adam Williams <aaronfwilli...@outlook.com>
wrote:

> I tried it even with 6GB and no luck.  It's receiving the flowfiles, but
> nothing is happening after.  If i do it with a small subset (3 JSON
> objects) it works perfect.  When i throw the 180MB file it just spins, no
> logging, errors etc very odd.  Any thoughts?
>
> Thanks
>
> --
> From: aaronfwilli...@outlook.com
> To: users@nifi.apache.org
> Subject: RE: Array into MongoDB
> Date: Thu, 24 Sep 2015 21:23:35 +
>
>
> Bryan,
>
> I think that is whats happening, fans spinning like crazy, this is my
> current bootstrap.conf.  I will bump it up, are there any other settings i
> should bump too?
>
> java.arg.2=-Xms512m
>
> java.arg.3=-Xmx2048m
>
> Thanks
>
> ------
> Date: Thu, 24 Sep 2015 17:20:27 -0400
> Subject: Re: Array into MongoDB
> From: bbe...@gmail.com
> To: users@nifi.apache.org
>
> One other thing I thought of... I think the JSON processors read the
> entire FlowFile content into memory to do the splitting/evaluating, so I
> wonder if you are running into a memory issue with a 180MB JSON file.
>
> Are you running with the default configuration of 512mb set in
> conf/bootstrap.conf ?  If so it would be interesting to see what happens if
> you bump that up.
>
> On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bende <bbe...@gmail.com> wrote:
>
> Adam,
>
> Based on that message I suspect that MongoDB does not support sending in
> an array of documents since it looks like it expect the first character to
> be the start of a document and not an array.
>
> With regards to the SplitJson processor, if you set the JSON Path to $
> then it should split at the top-level and send out each of your two
> documents on the splits relationship.
>
> -Bryan
>
>
> On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams <aaronfwilli...@outlook.com
> > wrote:
>
> I have an array of JSON object I am trying to put into Mongo, but I keep
> hitting this on the PutMongo processor:
>
> ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo
> PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert
> StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default,
> section=1], offset=0,
> length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due
> to org.bson.BsonInvalidOperationException: readStartDocument can only be
> called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is
> ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only
> be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is
> ARRAY.
>
>
>
> I tried to use the splitJson processor to split the array into segments,
> but to my experience I can't pull out each Json Obect.  The splitjson
> processor just hangs and never produces logs or any output at all.  The
> structure of my data is:
>
>
> [{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
>
>
> The JSON file itself is pretty large (>100mb).
>
>
> Thank you
>
>
>
>


Re: Array into MongoDB

2015-09-24 Thread Bryan Bende
One other thing I thought of... I think the JSON processors read the entire
FlowFile content into memory to do the splitting/evaluating, so I wonder if
you are running into a memory issue with a 180MB JSON file.

Are you running with the default configuration of 512mb set in
conf/bootstrap.conf ?  If so it would be interesting to see what happens if
you bump that up.

On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bende  wrote:

> Adam,
>
> Based on that message I suspect that MongoDB does not support sending in
> an array of documents since it looks like it expect the first character to
> be the start of a document and not an array.
>
> With regards to the SplitJson processor, if you set the JSON Path to $
> then it should split at the top-level and send out each of your two
> documents on the splits relationship.
>
> -Bryan
>
>
> On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams  > wrote:
>
>> I have an array of JSON object I am trying to put into Mongo, but I keep
>> hitting this on the PutMongo processor:
>>
>> ERROR [Timer-Driven Process Thread-1]
>> o.a.nifi.processors.mongodb.PutMongo
>> PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert
>> StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default,
>> section=1], offset=0,
>> length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due
>> to org.bson.BsonInvalidOperationException: readStartDocument can only be
>> called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is
>> ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only
>> be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is
>> ARRAY.
>>
>>
>>
>> I tried to use the splitJson processor to split the array into segments,
>> but to my experience I can't pull out each Json Obect.  The splitjson
>> processor just hangs and never produces logs or any output at all.  The
>> structure of my data is:
>>
>>
>> [{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
>>
>>
>> The JSON file itself is pretty large (>100mb).
>>
>>
>> Thank you
>>
>
>


Re: Array into MongoDB

2015-09-24 Thread Jeff



I’m having a very similar problem.  The process picks up the file, a custom 
processor does it’s thing but no data is sent out.





> On Sep 24, 2015, at 5:56 PM, Adam Williams <aaronfwilli...@outlook.com> wrote:
> 
> For JsonSplit i am using just "$" to try and get the array into individual 
> objects.  It worked on a small subset, but a large seems to just hang.
> 
> From: aldrinp...@gmail.com
> Date: Thu, 24 Sep 2015 18:54:06 -0400
> Subject: Re: Array into MongoDB
> To: users@nifi.apache.org
> 
> Bryan is correct about the backing library reading everything into memory to 
> do the evaluation.
> 
> Might I ask what the expression you are using?
> 
> On Thu, Sep 24, 2015 at 6:44 PM, Adam Williams <aaronfwilli...@outlook.com 
> <mailto:aaronfwilli...@outlook.com>> wrote:
> I tried it even with 6GB and no luck.  It's receiving the flowfiles, but 
> nothing is happening after.  If i do it with a small subset (3 JSON objects) 
> it works perfect.  When i throw the 180MB file it just spins, no logging, 
> errors etc very odd.  Any thoughts?
> 
> Thanks
> 
> From: aaronfwilli...@outlook.com <mailto:aaronfwilli...@outlook.com>
> To: users@nifi.apache.org <mailto:users@nifi.apache.org>
> Subject: RE: Array into MongoDB
> Date: Thu, 24 Sep 2015 21:23:35 +
> 
> 
> Bryan,
> 
> I think that is whats happening, fans spinning like crazy, this is my current 
> bootstrap.conf.  I will bump it up, are there any other settings i should 
> bump too?
> 
> java.arg.2=-Xms512m
> java.arg.3=-Xmx2048m
> 
> Thanks
> 
> Date: Thu, 24 Sep 2015 17:20:27 -0400
> Subject: Re: Array into MongoDB
> From: bbe...@gmail.com <mailto:bbe...@gmail.com>
> To: users@nifi.apache.org <mailto:users@nifi.apache.org>
> 
> One other thing I thought of... I think the JSON processors read the entire 
> FlowFile content into memory to do the splitting/evaluating, so I wonder if 
> you are running into a memory issue with a 180MB JSON file.
> 
> Are you running with the default configuration of 512mb set in 
> conf/bootstrap.conf ?  If so it would be interesting to see what happens if 
> you bump that up.
> 
> On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bende <bbe...@gmail.com 
> <mailto:bbe...@gmail.com>> wrote:
> Adam,
> 
> Based on that message I suspect that MongoDB does not support sending in an 
> array of documents since it looks like it expect the first character to be 
> the start of a document and not an array.
> 
> With regards to the SplitJson processor, if you set the JSON Path to $ then 
> it should split at the top-level and send out each of your two documents on 
> the splits relationship.
> 
> -Bryan
> 
> 
> On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams <aaronfwilli...@outlook.com 
> <mailto:aaronfwilli...@outlook.com>> wrote:
> I have an array of JSON object I am trying to put into Mongo, but I keep 
> hitting this on the PutMongo processor:
> 
> ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo 
> PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert 
> StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim
>  [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default, 
> section=1], offset=0, 
> length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due to 
> org.bson.BsonInvalidOperationException: readStartDocument can only be called 
> when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.: 
> org.bson.BsonInvalidOperationException: readStartDocument can only be called 
> when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.
> 
> 
> I tried to use the splitJson processor to split the array into segments, but 
> to my experience I can't pull out each Json Obect.  The splitjson processor 
> just hangs and never produces logs or any output at all.  The structure of my 
> data is:
> 
> [{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
> 
> The JSON file itself is pretty large (>100mb).
> 
> Thank you