Hi James, I don't have a solution for you off the top of my head. But I can tell you the failure is because you've got an array longer than the maximum value of an Int. So, memory is not the limiting factor.
-Eric On Fri, Jun 14, 2024, 10:59 AM James McMahon <jsmcmah...@gmail.com> wrote: > I have a json file, incoming.json. It is 9 GB in size. > > I want to flatten the json so that I can tabulate the number of times each > key appears. Am using a FlattenJson 2.0.0-M2 processor, with > this configuration: > > Separator . > Flatten Mode normal > Ignore Reserved Characters false > Return Type flatten > Character Set UTF-8 > Pretty Print JSON true > > This processor has worked so far on json files as large as 2 GB. But this > 9 GB one is causing this issue: > > FlattenJson[id=ea2650e2-8974-1ff7-2da9-a0f2cd303258] Processing halted: > yielding [1 sec]: java.lang.OutOfMemoryError: Required array length > 2147483639 + 9 is too large > > > htop confirms I have 92 GB or memory on my EC2 instance, and the NiFi heap > shows it has 88GB of that dedicated for its use. > > > How can I handle large json files in this processor? It would seem that > breaking the file up is not an option because it will violate the integrity > of the json structure most likely. > > > What options do I have? > >