If we consider streaming for SplitJson (or a new version of it), we wouldn't be able to support the "micro-batch" functionality as is in SplitJson today (like the fragment.count attribute, for example). Might not be a concern, or might warrant a new processor (SplitJsonStreaming, e.g.) .
Regards, Matt On Thu, Nov 17, 2016 at 11:36 AM, Aldrin Piri <[email protected]> wrote: > The backing library of the Json processors does indeed require loading the > entire doc into memory. We should make sure this consideration is documented > if not already. > > Could be an interesting idea to not tie SplitJson to this library given that > it might not need all the functionalities of JsonPath and would likely be a > good candidate for streaming. > On Thu, Nov 17, 2016 at 11:23 Mark Payne <[email protected]> wrote: >> >> Hi Mike, >> >> Certainly, I would recommend trying to change the max heap to say 2 GB and >> see if that gives you what you need. >> Looking at the code, it does look like this Processor may not be the most >> efficient in how it is parsing the JSON. >> There are libraries, for example, that provide a "Streaming JSON" >> interface, but this Processor loads the entire JSON >> into heap and then creates an Object Model from it. >> >> Also, what do you have set for the Max Concurrent Tasks? If you have >> multiple threads simultaneously running, you could >> have each one using up quite a lot of heap. >> >> Thanks >> -Mark >> >> >> On Nov 17, 2016, at 10:54 AM, Mike Harding <[email protected]> wrote: >> >> ..just for info in bootstrap.conf my heap size is as follows: >> >> java.arg.2=-Xms512m >> >> java.arg.3=-Xmx512m >> >> Would it be a simple case of increasing this? The size of the flowfile >> json array is 35MB. >> >> Mike >> >> >> >> On 17 November 2016 at 15:47, Mike Harding <[email protected]> wrote: >>> >>> Hi All, >>> >>> I have a flowfile containing a JSON array with 30k objects that I am >>> trying to split into separate flowfiles for down stream processing. >>> >>> The problem is the processor reports a GC Overhead Limit Exceeded warning >>> and administratively yields. >>> >>> Is there anyway of setting up a back pressure option or some changes to >>> the nifi config to best address this. >>> >>> Thanks, >>> Mike >> >> >> >
