Richard,

It sounds like your scripted reader is responsible for parsing the Avro? In 
short, the Record appears to have an Avro Utf8 value, not a String, in the 
field you’re looking at. You could call .toString() on that Utf8 object, or you 
could configure the Avro reader to return Strings instead of Utf8 objects.

Thanks
-Mark

On Feb 15, 2024, at 5:52 AM, Richard Beare <[email protected]> wrote:

Hi,
This is a test pipeline reading pdf files from disk. It begins with a GetFile 
processor supplying a ConvertRecord processor with a scripted reader input and 
an avrorecordsetwriter, generic output.

The scripted reader places the file content in a "content" field:

 List<RecordField> recordFields = []
recordFields.add(new RecordField("content", 
RecordFieldType.ARRAY.getArrayDataType(RecordFieldType.BYTE.getDataType())))
        schema = new SimpleRecordSchema(recordFields)

This bit seems OK.

Next step is update record which adds other fields to mimic the real case of 
pulling out of a DB - Age, gender etc, all of which are dummies and a timestamp 
based on the filename by the following expression language:

${filename:substringBeforeLast('.'):substringAfterLast('_'):toDate('yyyyMMdd'):format("yyyy-MM-dd
 HH:mm:ss")}

If I explicitly set the  schema for the record writer to include
 {"name":"Visit_DateTime","type": {"type" : "long", "logicalType" : 
"timestamp-millis"}},

then I can get the following converter, a groovy script, which converts to json 
for transmission to an web service, to deal with the dates as follows:

Date VisitTimeValue = null

VisitTimeValue = new Date(currRecord.get(TimeStampFieldName))


I guess I thought this approach was overly complex. Given that I'm using Date 
functions in the expression language I hoped that the generic avro writer would 
correctly infer the schema so that I didn't have to explicitly provide one. Is 
this approach the right one? Is there a way I can isolate the expectation of a 
date component inside the groovy file only?

I hope this is clear.
Thanks for your help.


On Thu, Feb 15, 2024 at 9:38 AM Mark Payne 
<[email protected]<mailto:[email protected]>> wrote:
Hey Richard,

I think you’d need to explain more about what you’re doing in your groovy 
script. What processor are you using? What’s the script doing? Is it parsing 
Avro data?

On Jan 29, 2024, at 12:26 AM, Richard Beare 
<[email protected]<mailto:[email protected]>> wrote:

Anyone able to offer assistance with this?

I think my problem relates to correctly specifying types using expression 
languages and using schema inference from groovy.

On Tue, Jan 23, 2024 at 2:20 PM Richard Beare 
<[email protected]<mailto:[email protected]>> wrote:
Hi,
What is the right way to deal with dates in the following context.

I'm using the updaterecord processor to add a datestamp field to a record 
(derived from a filename attribute inserted by the getfile processor).

/Visit_DateTime.  
${filename:substringBeforeLast('.'):substringAfterLast('_'):toDate('yyyyMMdd'):format('yyyy-MM-dd'T'HH:mm:ss'Z'")

Inside the groovy script I'm attempting to convert to date as follows:

VisitTimeValue = new Date(currRecord.get(Visit_DateTime as String))

However I always get messages about "could not find matching constructor for 
java.util.Date(org.apackge.avro.util.Utf8)"

I have a previously working version, from a slightly different context which 
did a cast to long: Date((long)currRecord.get....). In that case the record was 
created by a database query.

The eventual use of VisitTimeValue is to dump it into a flowfile attribute.

It seems to me that the type of the date field is not being correctly inferred 
by the avro reader/writers after I create it with the expression language. 
Alternatively, perhaps I should be using different date handling tools inside 
groovy.

All advice welcome.
Thanks



Reply via email to