Tim,

AFAIK, JsonSlurper uses lazy evaluation when parsing the JSON string. So
you might want to extract your JsonSlurper-centric logic and run/time this
with groovy or groovysh (both are in the standard Groovy distribution).
It's possible that it's doing a lot more work during the formatting of the
data than is immediately obvious if you have a very large JSON document.

On Wed, May 2, 2018 at 11:03 AM Timothy Tschampel <
[email protected]> wrote:

> Thanks Matt.  The majority of messages are < 10 KB, however I am seeing
> some that are > 10MB in their raw flat file format before being transformed
> into their more verbose JSON.  So possibly 2-3x that for max size.  Before
> reaching this processor all messages are well-formed JSON and using the
> UTF-8 charset.   The script is doing a bunch of date formatting and other
> formatting that is too complex for the JOLT processor.
>
>
>
> > On May 1, 2018, at 7:14 PM, Matt Burgess <[email protected]> wrote:
> >
> > Timothy,
> >
> > I haven't seen anything that can cause this to hang, in the Groovy
> > source code it might seem to "hang" [1] if there's a crazy large
> > input; how big are your flow files going into the ExecuteScript
> > processor?  If size is not the issue, then perhaps there's an
> > assumption about character sets that causes a problem, etc. Are your
> > input files well-formed JSON, and if so, are they ROUSs? (Rodents Of
> > Unusual Size)? Are they encoded with unicode or other character sets?
> >
> > I'll try to reproduce this locally and get to the bottom of it, if
> > it's a Groovy bug that has been fixed we can upgrade the version, if
> > it's a bug that isn't fixed then hopefully there's a workaround with
> > ValidateRecord.  Lastly, may I ask what your script is doing to the
> > flow files? Perhaps there are existing processors and/or techniques we
> > could use to get it done...
> >
> > Regards,
> > Matt
> >
> > [1]
> https://github.com/apache/groovy/blob/GROOVY_2_4_5/subprojects/groovy-json/src/main/java/groovy/json/internal/JsonParserCharArray.java#L108
> >
> >
> > On Tue, May 1, 2018 at 2:54 PM, Timothy Tschampel
> > <[email protected]> wrote:
> >> I have a flow which periodically hangs and messages begin to queue
> behind an ExecuteScript component using groovy.  Once this happens the
> component can’t be stopped or restarted.  Restarting alone does not move
> things along; only a restart after purging the content/flow file
> repositories seems to help.  I’m not seeing any errors in the logs. Thread
> dumps show the same “ScriptXX.run” running at the same place
> (groovy.json.internal.JsonParserCharArray.decodeJsonObject(JsonParserCharArray.java:108))
> and possibly several others running in slightly different paths originated
> from groovy.json.JsonSlurper.parseText(JsonSlurper.java:205).   Is there
> anything special about the groovy json parser with regards to configuring
> the processor or using within a Nifi flow?   I have attached 2 dumps of
> this happening at different times.  I have a suspicion that possibly a
> mangled message is causing a problem, but I would have expected some sort
> of error in this case.
>
>

Reply via email to