I have a json file, incoming.json. It is 9 GB in size.

I want to flatten the json so that I can tabulate the number of times each
key appears. Am using a FlattenJson 2.0.0-M2 processor, with
this configuration:

Separator                                   .
Flatten Mode                              normal
Ignore Reserved Characters      false
Return Type                                flatten
Character Set                              UTF-8
Pretty Print JSON                       true

This processor has worked so far on json files as large as 2 GB. But this 9
GB one is causing this issue:

FlattenJson[id=ea2650e2-8974-1ff7-2da9-a0f2cd303258] Processing
halted: yielding [1 sec]: java.lang.OutOfMemoryError: Required array
length 2147483639 + 9 is too large


htop confirms I have 92 GB or memory on my EC2 instance, and the NiFi
heap shows it has 88GB of that dedicated for its use.


How can I handle large json files in this processor? It would seem
that breaking the file up is not an option because it will violate the
integrity of the json structure most likely.


What options do I have?

Reply via email to