I was trying to avoid having to check every single file since that will impact 
performance. I  could run a ReplaceText on each file prior to parsing the 
records but the files may be 100-200mb and that slows things down a bit.

Thanks
Shawn

From: Mike Thomsen <mikerthom...@gmail.com>
Sent: Friday, October 26, 2018 11:38 AM
To: users@nifi.apache.org
Subject: Re: ScriptedRecordReader Error Handling

As a backup to that, you can also write a Groovy script for ExecuteScript that 
uses stax to iterate over the XML data. It won't care about schemas (Avro or 
XML) and stuff like that; just check for basic validity.

On Fri, Oct 26, 2018 at 11:42 AM Joe Witt 
<joe.w...@gmail.com<mailto:joe.w...@gmail.com>> wrote:
Cant your logic detect the strange characters and then apply its
behavior?  Alternatively, you could perhaps use ValidateRecord and
have its reader only understand the good records.  It should kick out
the bad records and you can then apply deeper processing on them.

Thanks
On Fri, Oct 26, 2018 at 11:36 AM Shawn Weeks 
<swe...@weeksconsulting.us<mailto:swe...@weeksconsulting.us>> wrote:
>
> Is there anyway for a ScriptedRecordReader to set an attribute on a FlowFile 
> when there is an error? Have a situation where I've written a groovy script 
> to parse xml into a specific record structure and occasionally the incoming 
> data has characters not allowed in XML. Unfortunately the system that 
> generates the XML is doing it through string manipulation instead of actually 
> understanding XML so it crams all kinds of junk characters in the data. I'd 
> rather not scrub every file as some of them can be large so I was trying to 
> figure out a way to only scrub them on exception.
>
>
> Thanks
>
> Shawn Weeks

Reply via email to