Tim, AFAIK, JsonSlurper uses lazy evaluation when parsing the JSON string. So you might want to extract your JsonSlurper-centric logic and run/time this with groovy or groovysh (both are in the standard Groovy distribution). It's possible that it's doing a lot more work during the formatting of the data than is immediately obvious if you have a very large JSON document.
On Wed, May 2, 2018 at 11:03 AM Timothy Tschampel < [email protected]> wrote: > Thanks Matt. The majority of messages are < 10 KB, however I am seeing > some that are > 10MB in their raw flat file format before being transformed > into their more verbose JSON. So possibly 2-3x that for max size. Before > reaching this processor all messages are well-formed JSON and using the > UTF-8 charset. The script is doing a bunch of date formatting and other > formatting that is too complex for the JOLT processor. > > > > > On May 1, 2018, at 7:14 PM, Matt Burgess <[email protected]> wrote: > > > > Timothy, > > > > I haven't seen anything that can cause this to hang, in the Groovy > > source code it might seem to "hang" [1] if there's a crazy large > > input; how big are your flow files going into the ExecuteScript > > processor? If size is not the issue, then perhaps there's an > > assumption about character sets that causes a problem, etc. Are your > > input files well-formed JSON, and if so, are they ROUSs? (Rodents Of > > Unusual Size)? Are they encoded with unicode or other character sets? > > > > I'll try to reproduce this locally and get to the bottom of it, if > > it's a Groovy bug that has been fixed we can upgrade the version, if > > it's a bug that isn't fixed then hopefully there's a workaround with > > ValidateRecord. Lastly, may I ask what your script is doing to the > > flow files? Perhaps there are existing processors and/or techniques we > > could use to get it done... > > > > Regards, > > Matt > > > > [1] > https://github.com/apache/groovy/blob/GROOVY_2_4_5/subprojects/groovy-json/src/main/java/groovy/json/internal/JsonParserCharArray.java#L108 > > > > > > On Tue, May 1, 2018 at 2:54 PM, Timothy Tschampel > > <[email protected]> wrote: > >> I have a flow which periodically hangs and messages begin to queue > behind an ExecuteScript component using groovy. Once this happens the > component can’t be stopped or restarted. Restarting alone does not move > things along; only a restart after purging the content/flow file > repositories seems to help. I’m not seeing any errors in the logs. Thread > dumps show the same “ScriptXX.run” running at the same place > (groovy.json.internal.JsonParserCharArray.decodeJsonObject(JsonParserCharArray.java:108)) > and possibly several others running in slightly different paths originated > from groovy.json.JsonSlurper.parseText(JsonSlurper.java:205). Is there > anything special about the groovy json parser with regards to configuring > the processor or using within a Nifi flow? I have attached 2 dumps of > this happening at different times. I have a suspicion that possibly a > mangled message is causing a problem, but I would have expected some sort > of error in this case. > >
