Hi James,

I don't have a solution for you off the top of my head. But I can tell you
the failure is because you've got an array longer than the maximum value of
an Int. So, memory is not the limiting factor.

-Eric

On Fri, Jun 14, 2024, 10:59 AM James McMahon <jsmcmah...@gmail.com> wrote:

> I have a json file, incoming.json. It is 9 GB in size.
>
> I want to flatten the json so that I can tabulate the number of times each
> key appears. Am using a FlattenJson 2.0.0-M2 processor, with
> this configuration:
>
> Separator                                   .
> Flatten Mode                              normal
> Ignore Reserved Characters      false
> Return Type                                flatten
> Character Set                              UTF-8
> Pretty Print JSON                       true
>
> This processor has worked so far on json files as large as 2 GB. But this
> 9 GB one is causing this issue:
>
> FlattenJson[id=ea2650e2-8974-1ff7-2da9-a0f2cd303258] Processing halted: 
> yielding [1 sec]: java.lang.OutOfMemoryError: Required array length 
> 2147483639 + 9 is too large
>
>
> htop confirms I have 92 GB or memory on my EC2 instance, and the NiFi heap 
> shows it has 88GB of that dedicated for its use.
>
>
> How can I handle large json files in this processor? It would seem that 
> breaking the file up is not an option because it will violate the integrity 
> of the json structure most likely.
>
>
> What options do I have?
>
>

Reply via email to