I have a json file, incoming.json. It is 9 GB in size. I want to flatten the json so that I can tabulate the number of times each key appears. Am using a FlattenJson 2.0.0-M2 processor, with this configuration:
Separator . Flatten Mode normal Ignore Reserved Characters false Return Type flatten Character Set UTF-8 Pretty Print JSON true This processor has worked so far on json files as large as 2 GB. But this 9 GB one is causing this issue: FlattenJson[id=ea2650e2-8974-1ff7-2da9-a0f2cd303258] Processing halted: yielding [1 sec]: java.lang.OutOfMemoryError: Required array length 2147483639 + 9 is too large htop confirms I have 92 GB or memory on my EC2 instance, and the NiFi heap shows it has 88GB of that dedicated for its use. How can I handle large json files in this processor? It would seem that breaking the file up is not an option because it will violate the integrity of the json structure most likely. What options do I have?