Shawn,

Sorry I didn't realize Hive Streaming was your endpoint, could've
saved you the trouble :) Also with that boolean, my point was to mark
the field as nullable:

public RecordField(final String fieldName, final DataType dataType,
final boolean nullable)

which would avoid the nullable union being created, in the hopes of
skipping the check to see "what kind of array" your object was.

Regards,
Matt

P.S. If ORC supports byte[] then Hive should too, if you are using
PutHiveStreaming then it's possible this has been fixed for Hive 3 and
its NiFi counterpart PutHive3Streaming (and/or PutORC).

On Mon, Sep 17, 2018 at 3:49 PM Shawn Weeks <[email protected]> wrote:
>
> This may be all a useless endeavor anyway as the Hive Streaming Processor 
> doesn't support Binary data.
>
>
> [java.io.IOException: JsonSerDe does not support BINARY type]
>
>
> I did however get the Avro file to generate by changing the type.
>
>
> def recordSchema = new SimpleRecordSchema(
> [
> new RecordField('id',RecordFieldType.STRING.dataType),
> new RecordField('file_name',RecordFieldType.STRING.dataType),
> new RecordField('file_path',RecordFieldType.STRING.dataType),
> new RecordField('file_size',RecordFieldType.LONG.dataType),
> new 
> RecordField('flow_file',RecordFieldType.ARRAY.getArrayDataType(RecordFieldType.BYTE.dataType),(boolean)false)
> ]
> )
>
> Thanks
>
> Shawn Weeks
>
> ________________________________
> From: Shawn Weeks <[email protected]>
> Sent: Monday, September 17, 2018 2:36:40 PM
> To: [email protected]
> Subject: Re: Scripted Record Reader - Missing Something Obvious
>
>
> It's not happy about the default either. I'm just trying to create field of 
> bytes in an Avro file. Should I be using the type BYTE directly instead of 
> inside an ARRAY?
>
> Caused by: java.lang.IllegalArgumentException: Cannot set the default value 
> for field [flow_file] to [false] because that is not a valid value for Data 
> Type [ARRAY[BYTE]]
>         at 
> org.apache.nifi.serialization.record.RecordField.<init>(RecordField.java:65)
>         at 
> org.apache.nifi.serialization.record.RecordField.<init>(RecordField.java:44)
>         at sun.reflect.GeneratedConstructorAccessor599.newInstance(Unknown 
> Source)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at 
> org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:80)
>         at 
> org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrapNoCoerce.callConstructor(ConstructorSite.java:105)
>         at 
> org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:263)
>         at GroovyRecordReader.<init>(Script19.groovy:34)
>         at sun.reflect.GeneratedConstructorAccessor598.newInstance(Unknown 
> Source)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at 
> org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:80)
>         at 
> org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrapNoCoerce.callConstructor(ConstructorSite.java:105)
>         at 
> org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:263)
>         at GroovyRecordReaderFactory.createRecordReader(Script19.groovy:85)
>         at sun.reflect.GeneratedMethodAccessor703.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
>         at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
>         at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1210)
>         at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019)
>         at 
> org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:919)
>         at 
> org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:902)
>         at 
> org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeImpl(GroovyScriptEngineImpl.java:398)
>         ... 23 common frames omitted
>
> Thanks
>
> Shawn
>
> ________________________________
> From: Matt Burgess <[email protected]>
> Sent: Monday, September 17, 2018 2:24:58 PM
> To: [email protected]
> Subject: Re: Scripted Record Reader - Missing Something Obvious
>
> The ByteBuffer check wasn't added until 1.7.0 so it may not be in HDF
> 3.1.2. However if that is the place in the code that's causing
> trouble, you may need to convert to Avro's Array class or Java's List.
> Still just a hunch though, I haven't tried it :)
>
> The other thing is that if you will be outputting anything but null in
> that field (empty string, e.g.), you don't need a nullable
> RecordField, so you can add a boolean 'false' param to your
> RecordField constructor for that field, and that may get you around
> the union issue.
>
> Regards,
> Matt
>
> On Mon, Sep 17, 2018 at 3:19 PM Shawn Weeks <[email protected]> wrote:
> >
> > It's the Hortonworks variation on 1.5. HDF 3.1.2. I'll try the ByteBuffer
> >
> > ________________________________
> > From: Matt Burgess <[email protected]>
> > Sent: Monday, September 17, 2018 2:18:30 PM
> > To: [email protected]
> > Subject: Re: Scripted Record Reader - Missing Something Obvious
> >
> > Shawn,
> >
> > What version of NiFi are you using? I'm wondering if you are running
> > into NIFI-4232 [1]  or NIFI-4857 [2] or something like that. If
> > isCompatibleDataType() is the offender, try putting your byte[] into a
> > ByteBuffer.
> >
> > Regards,
> > Matt
> >
> > [1] https://issues.apache.org/jira/browse/NIFI-4232
> > [2] https://issues.apache.org/jira/browse/NIFI-4857
> >
> > On Mon, Sep 17, 2018 at 3:06 PM Shawn Weeks <[email protected]> 
> > wrote:
> > >
> > > So I've tried the Apache Commons ArrayUtils.toObject and now I get this 
> > > which isn't much different.
> > >
> > >
> > > new MapRecord(recordSchema,[
> > > 'id':variables.uuid,
> > > 'file_name':variables."original_filename",
> > > 'file_path':variables.path,
> > > 'file_size':variables."file.size",
> > > 'flow_file':ArrayUtils.toObject(inputStream.getBytes())
> > > ]
> > >
> > >
> > > 2018-09-17 13:59:05,291 ERROR [Timer-Driven Process Thread-220] 
> > > o.a.nifi.processors.standard.MergeRecord 
> > > MergeRecord[id=b81d3d84-4ffc-110b-8276-64270d885e10] Failed to create 
> > > merged FlowFile from 1 input FlowFiles; routing originals to failure: 
> > > org.apache.nifi.serialization.record.util.IllegalTypeConversionException: 
> > > Cannot convert value [Ljava.lang.Byte;@bd6d229 of type class 
> > > [Ljava.lang.Byte; because no compatible types exist in the UNION for 
> > > field flow_file
> > > org.apache.nifi.serialization.record.util.IllegalTypeConversionException: 
> > > Cannot convert value [Ljava.lang.Byte;@bd6d229 of type class 
> > > [Ljava.lang.Byte; because no compatible types exist in the UNION for 
> > > field flow_file
> > >         at 
> > > org.apache.nifi.avro.AvroTypeUtil.convertUnionFieldValue(AvroTypeUtil.java:708)
> > >         at 
> > > org.apache.nifi.avro.AvroTypeUtil.convertToAvroObject(AvroTypeUtil.java:607)
> > >         at 
> > > org.apache.nifi.avro.AvroTypeUtil.createAvroRecord(AvroTypeUtil.java:456)
> > >         at 
> > > org.apache.nifi.avro.WriteAvroResultWithSchema.writeRecord(WriteAvroResultWithSchema.java:60)
> > >         at 
> > > org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:59)
> > >         at 
> > > org.apache.nifi.processors.standard.merge.RecordBin.offer(RecordBin.java:142)
> > >         at 
> > > org.apache.nifi.processors.standard.merge.RecordBinManager.add(RecordBinManager.java:154)
> > >         at 
> > > org.apache.nifi.processors.standard.MergeRecord.binFlowFile(MergeRecord.java:327)
> > >         at 
> > > org.apache.nifi.processors.standard.MergeRecord.onTrigger(MergeRecord.java:294)
> > >         at 
> > > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1124)
> > >         at 
> > > org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
> > >         at 
> > > org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
> > >         at 
> > > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128)
> > >         at 
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > >         at 
> > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> > >         at 
> > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> > >         at 
> > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> > >         at 
> > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > >         at 
> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > ________________________________
> > > From: Mark Payne <[email protected]>
> > > Sent: Monday, September 17, 2018 1:54:57 PM
> > > To: [email protected]
> > > Subject: Re: Scripted Record Reader - Missing Something Obvious
> > >
> > > Shawn,
> > >
> > > I believe the issue that you're running into is that you're defining the 
> > > 'flow_file' field of an Array of type Byte.
> > > Which means that it is expecting as its value an object of type Byte[] 
> > > but you are passing it an object of type byte[].
> > > You'd have to create a Byte[] instead, using the object wrapper instead 
> > > of the primitive byte array.
> > >
> > > Thanks
> > > -Mark
> > >
> > >
> > > On Sep 17, 2018, at 2:12 PM, Shawn Weeks <[email protected]> 
> > > wrote:
> > >
> > > Let's say I'm trying to store the contents of a flow file in byte array 
> > > column in Avro. I've defined a recordSchema in Groovy like this.
> > >
> > > def recordSchema = new SimpleRecordSchema(
> > > [
> > > new RecordField('id',RecordFieldType.STRING.dataType),
> > > new RecordField('file_name',RecordFieldType.STRING.dataType),
> > > new RecordField('file_path',RecordFieldType.STRING.dataType),
> > > new RecordField('file_size',RecordFieldType.LONG.dataType),
> > > new 
> > > RecordField('flow_file',RecordFieldType.ARRAY.getArrayDataType(RecordFieldType.BYTE.dataType))
> > >  ]
> > > )
> > >
> > > And I'm attempting to populate nextRecord with this.
> > >
> > > new MapRecord(recordSchema,[
> > > 'id':variables.uuid,
> > > 'file_name':variables."original_filename",
> > > 'file_path':variables.path,
> > > 'file_size':variables."file.size",
> > > 'flow_file':inputStream.getBytes()
> > > ]
> > >
> > >
> > > And I get this error. I'm missing something about how to store the 
> > > contents of the flowfile as a binary column Avro but I'm not sure what 
> > > I'm missing.
> > >
> > > org.apache.nifi.serialization.record.util.IllegalTypeConversionException: 
> > > Cannot convert value [B@6c4141cf of type class [B because no compatible 
> > > types exist in the UNION for field flow_file
> > >         at 
> > > org.apache.nifi.avro.AvroTypeUtil.convertUnionFieldValue(AvroTypeUtil.java:708)
> > >         at 
> > > org.apache.nifi.avro.AvroTypeUtil.convertToAvroObject(AvroTypeUtil.java:607)
> > >         at 
> > > org.apache.nifi.avro.AvroTypeUtil.createAvroRecord(AvroTypeUtil.java:456)
> > >         at 
> > > org.apache.nifi.avro.WriteAvroResultWithSchema.writeRecord(WriteAvroResultWithSchema.java:60)
> > >         at 
> > > org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:59)
> > >         at 
> > > org.apache.nifi.processors.standard.merge.RecordBin.offer(RecordBin.java:142)
> > >         at 
> > > org.apache.nifi.processors.standard.merge.RecordBinManager.add(RecordBinManager.java:154)
> > >         at 
> > > org.apache.nifi.processors.standard.MergeRecord.binFlowFile(MergeRecord.java:327)
> > >         at 
> > > org.apache.nifi.processors.standard.MergeRecord.onTrigger(MergeRecord.java:294)
> > >         at 
> > > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1124)
> > >         at 
> > > org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
> > >         at 
> > > org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
> > >         at 
> > > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128)
> > >         at 
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > >         at 
> > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> > >         at 
> > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> > >         at 
> > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> > >         at 
> > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > >         at 
> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > >
> > > Thanks
> > > Shawn Weeks
> > >
> > >

Reply via email to