[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650268#comment-16650268 ] ASF GitHub Bot commented on NIFI-5667: -- Github user asfgit closed the pull request at: https://github.com/apache/nifi/pull/3057 > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > Fix For: 1.8.0 > > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272) > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) > at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) > at > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650266#comment-16650266 ] ASF subversion and git services commented on NIFI-5667: --- Commit ce25ae54196a318cbfcdb4dfe178607f4ac135c6 in nifi's branch refs/heads/master from [~ca9mbu] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=ce25ae5 ] NIFI-5667: Add nested record support for PutORC NIFI-5667: Fixed default table name NIFI-5667: Fixed handling of binary types NIFI-5667: Added backticks in Hive DDL generation This closes #3057. Signed-off-by: Bryan Bende > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272) > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) > at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) > at >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650263#comment-16650263 ] ASF subversion and git services commented on NIFI-5667: --- Commit ce25ae54196a318cbfcdb4dfe178607f4ac135c6 in nifi's branch refs/heads/master from [~ca9mbu] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=ce25ae5 ] NIFI-5667: Add nested record support for PutORC NIFI-5667: Fixed default table name NIFI-5667: Fixed handling of binary types NIFI-5667: Added backticks in Hive DDL generation This closes #3057. Signed-off-by: Bryan Bende > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272) > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) > at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) > at >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650264#comment-16650264 ] ASF subversion and git services commented on NIFI-5667: --- Commit ce25ae54196a318cbfcdb4dfe178607f4ac135c6 in nifi's branch refs/heads/master from [~ca9mbu] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=ce25ae5 ] NIFI-5667: Add nested record support for PutORC NIFI-5667: Fixed default table name NIFI-5667: Fixed handling of binary types NIFI-5667: Added backticks in Hive DDL generation This closes #3057. Signed-off-by: Bryan Bende > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272) > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) > at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) > at >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650265#comment-16650265 ] ASF subversion and git services commented on NIFI-5667: --- Commit ce25ae54196a318cbfcdb4dfe178607f4ac135c6 in nifi's branch refs/heads/master from [~ca9mbu] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=ce25ae5 ] NIFI-5667: Add nested record support for PutORC NIFI-5667: Fixed default table name NIFI-5667: Fixed handling of binary types NIFI-5667: Added backticks in Hive DDL generation This closes #3057. Signed-off-by: Bryan Bende > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272) > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) > at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) > at >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650259#comment-16650259 ] ASF GitHub Bot commented on NIFI-5667: -- Github user bbende commented on the issue: https://github.com/apache/nifi/pull/3057 Looks good to me, was able to verify the functionality, going to merge > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272) > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) > at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) > at > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648425#comment-16648425 ] ASF GitHub Bot commented on NIFI-5667: -- Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/3057 Updated the rest of the backticks, thanks @VikingK and @bbende for your reviews! > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272) > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) > at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) > at > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647596#comment-16647596 ] ASF GitHub Bot commented on NIFI-5667: -- Github user VikingK commented on the issue: https://github.com/apache/nifi/pull/3057 @mattyb149 I checked out the new backtick ddl patch you applied. Its seems it only backticks the complex data structures, the "top" level ones are for example not ticked. For example ProductId,Items,PropertyMap,Serial and Metadata should be backticked as well to protect from bad names. ``` hive.ddl CREATE EXTERNAL TABLE IF NOT EXISTS Sales (ProductId INT, Items ARRAY>, PropertyMap MAP, Serial STRUCT<`Serial`:BIGINT, `Date`:TIMESTAMP, `SystemId`:BIGINT, `T`:BIGINT, `IncludeStartTx`:BOOLEAN>, Metadata STRUCT<`AvroTs`:TIMESTAMP>) STORED AS ORC ``` > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272) > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > at
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646982#comment-16646982 ] ASF GitHub Bot commented on NIFI-5667: -- Github user VikingK commented on the issue: https://github.com/apache/nifi/pull/3057 @mattyb149 asome work, I was gonna suggest the backtick feature since its a pain when downstream systems sends all kinds of weird names, like Timestamp etc. My solution right now is an ugly groovy script that eliminates them. I'll run some testing tomorrow on the fixes. > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272) > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) > at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) > at >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646959#comment-16646959 ] ASF GitHub Bot commented on NIFI-5667: -- Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/3057 I found the issue and pushed a new commit with the fix. Also your test data exercised another part of the code I hadn't thought of, your "as" field is a keyword in Hive so when I used the generated hive.ddl attribute to create the table on top of the ORC files, it didn't work. The other change in this commit is to backtick-quote the field names to protect against field names that are reserved words. > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272) > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) > at >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646525#comment-16646525 ] ASF GitHub Bot commented on NIFI-5667: -- Github user bbende commented on the issue: https://github.com/apache/nifi/pull/3057 Ok using your test.avro going into PutORC with an AvroReader that uses embedded schema I can get the error you showed earlier. I also tested ConvertRecord using the AvroReader and JsonWrtier and that works and produces the following JSON: ``` [ { "OItems": [{ "Od" : 9, "HS" : [47,119,61,61], "AS" : [65,65,61,61], "NS" : "0" }] } ] ``` > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272) > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) > at >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646488#comment-16646488 ] ASF GitHub Bot commented on NIFI-5667: -- Github user VikingK commented on the issue: https://github.com/apache/nifi/pull/3057 Maybe it's because I am missing a name: on the array level that's causing the Jason to pick 'array'. I'll test later tonight. > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272) > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) > at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) > at > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646458#comment-16646458 ] ASF GitHub Bot commented on NIFI-5667: -- Github user VikingK commented on the issue: https://github.com/apache/nifi/pull/3057 @bbende wierd, I tried it again and I got the same error, here is the output from avro tools and I also attached my test.avro message ``` java -jar avro-tools-1.8.2.jar getschema test.avro log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. { "type" : "record", "name" : "IOlist", "namespace" : "analytics.models.its", "fields" : [ { "name" : "OItems", "type" : [ "null", { "type" : "array", "items" : { "type" : "record", "name" : "ISC", "namespace" : "analytics.models.its.iolist.oitems", "fields" : [ { "name" : "Od", "type" : [ "null", "long" ] }, { "name" : "HS", "type" : [ "null", "bytes" ] }, { "name" : "AS", "type" : [ "null", "bytes" ] }, { "name" : "NS", "type" : [ "null", "string" ] } ] } } ] } ] } java -jar avro-tools-1.8.2.jar tojson test.avro log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. {"OItems":{"array":[{"Od":{"long":9},"HS":{"bytes":"/w=="},"AS":{"bytes":"AA=="},"NS":{"string":"0"}}]}} ``` [test.zip](https://github.com/apache/nifi/files/2469204/test.zip) > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646429#comment-16646429 ] ASF GitHub Bot commented on NIFI-5667: -- Github user bbende commented on the issue: https://github.com/apache/nifi/pull/3057 @VikingK in your schema it has OItems defined as an array, but then in the JSON OItems is not an array, its an object with a field called array. So running with that schema and example JSON I get: ``` Caused by: java.lang.ClassCastException: org.codehaus.jackson.node.ObjectNode cannot be cast to org.codehaus.jackson.node.ArrayNode at org.apache.nifi.json.JsonTreeRowRecordReader.convertField(JsonTreeRowRecordReader.java:188) at org.apache.nifi.json.JsonTreeRowRecordReader.convertJsonNodeToRecord(JsonTreeRowRecordReader.java:118) at org.apache.nifi.json.JsonTreeRowRecordReader.convertJsonNodeToRecord(JsonTreeRowRecordReader.java:83) at org.apache.nifi.json.JsonTreeRowRecordReader.convertJsonNodeToRecord(JsonTreeRowRecordReader.java:74) at org.apache.nifi.json.AbstractJsonRowRecordReader.nextRecord(AbstractJsonRowRecordReader.java:92) ``` Which makes sense because the OItems field is not an array, but the schema says it is. I'm trying to figure out how to reproduce the other error you showed. > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645356#comment-16645356 ] ASF GitHub Bot commented on NIFI-5667: -- Github user VikingK commented on the issue: https://github.com/apache/nifi/pull/3057 Sorry for the late answer got bogged down. Avro: ``` { "name": "IOlist", "namespace": "analytics.models.its", "type": "record", "fields": [ { "name": "OItems", "type": [ "null", { "type": "array", "items": { "name": "ISC", "namespace": "analytics.models.its.iolist.oitems", "type": "record", "fields": [ { "name": "Od", "type": [ "null", "long" ] }, { "name": "HS", "type": [ "null", "bytes" ] }, { "name": "AS", "type": [ "null", "bytes" ] }, { "name": "NS", "type": [ "null", "string" ] } ] } } ] } ] } ``` JSON ``` { "OItems" : { "array" : [ { "Od" : { "long" : 9 }, "HS" : { "bytes" : "/w==" }, "AS" : { "bytes" : "AA==" }, "NS" : { "string" : "0" } } ] } } ``` > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645145#comment-16645145 ] ASF GitHub Bot commented on NIFI-5667: -- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/3057#discussion_r224126176 --- Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive3-processors/src/main/java/org/apache/nifi/processors/orc/PutORC.java --- @@ -157,19 +155,17 @@ public String getDefaultCompressionType(final ProcessorInitializationContext con public HDFSRecordWriter createHDFSRecordWriter(final ProcessContext context, final FlowFile flowFile, final Configuration conf, final Path path, final RecordSchema schema) throws IOException, SchemaNotFoundException { -final Schema avroSchema = AvroTypeUtil.extractAvroSchema(schema); - final long stripeSize = context.getProperty(STRIPE_SIZE).asDataSize(DataUnit.B).longValue(); final int bufferSize = context.getProperty(BUFFER_SIZE).asDataSize(DataUnit.B).intValue(); final CompressionKind compressionType = CompressionKind.valueOf(context.getProperty(COMPRESSION_TYPE).getValue()); final boolean normalizeForHive = context.getProperty(HIVE_FIELD_NAMES).asBoolean(); -TypeInfo orcSchema = NiFiOrcUtils.getOrcField(avroSchema, normalizeForHive); +TypeInfo orcSchema = NiFiOrcUtils.getOrcSchema(schema, normalizeForHive); final Writer orcWriter = NiFiOrcUtils.createWriter(path, conf, orcSchema, stripeSize, compressionType, bufferSize); final String hiveTableName = context.getProperty(HIVE_TABLE_NAME).isSet() ? context.getProperty(HIVE_TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue() -: NiFiOrcUtils.normalizeHiveTableName(avroSchema.getFullName()); +: NiFiOrcUtils.normalizeHiveTableName(schema.toString());// TODO --- End diff -- Yep that's not the right thing to put there :) Will investigate getting a name from the record somehow, or defaulting to a hardcoded table name if none is provided. > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645097#comment-16645097 ] ASF GitHub Bot commented on NIFI-5667: -- Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/3057 Can you share the schema of the data, and possibly a JSON export of your Avro file? I couldn't reproduce this with an array of ints, and @bbende ran successfully with an array of records. > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.8.0, 1.7.1 > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272) > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) > at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) > at > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) > at
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644914#comment-16644914 ] ASF GitHub Bot commented on NIFI-5667: -- Github user VikingK commented on the issue: https://github.com/apache/nifi/pull/3057 Hi, I have been verifying the pull request and I came across this error for one of my Avro messages, I am currently working to figure out which part of the message caused this. ``` 2018-10-10 14:14:42,489 ERROR [Timer-Driven Process Thread-6] org.apache.nifi.processors.orc.PutORC PutORC[id=0430e7ab-99f1-3e25-c491-935245567fa3] Failed to write due to java.lang.ClassCa stException: org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo: java.lang.ClassCastException: org.apache.hadoop .hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo at org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:127) at org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:177) at org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.lambda$convertToORCObject$0(NiFiOrcUtils.java:129) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:130) at org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:73) at org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:94) at org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2235) at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2203) at org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) at org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.8.0, 1.7.1 > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644217#comment-16644217 ] ASF GitHub Bot commented on NIFI-5667: -- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/3057#discussion_r223893228 --- Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive3-processors/src/main/java/org/apache/nifi/processors/orc/PutORC.java --- @@ -157,19 +155,17 @@ public String getDefaultCompressionType(final ProcessorInitializationContext con public HDFSRecordWriter createHDFSRecordWriter(final ProcessContext context, final FlowFile flowFile, final Configuration conf, final Path path, final RecordSchema schema) throws IOException, SchemaNotFoundException { -final Schema avroSchema = AvroTypeUtil.extractAvroSchema(schema); - final long stripeSize = context.getProperty(STRIPE_SIZE).asDataSize(DataUnit.B).longValue(); final int bufferSize = context.getProperty(BUFFER_SIZE).asDataSize(DataUnit.B).intValue(); final CompressionKind compressionType = CompressionKind.valueOf(context.getProperty(COMPRESSION_TYPE).getValue()); final boolean normalizeForHive = context.getProperty(HIVE_FIELD_NAMES).asBoolean(); -TypeInfo orcSchema = NiFiOrcUtils.getOrcField(avroSchema, normalizeForHive); +TypeInfo orcSchema = NiFiOrcUtils.getOrcSchema(schema, normalizeForHive); final Writer orcWriter = NiFiOrcUtils.createWriter(path, conf, orcSchema, stripeSize, compressionType, bufferSize); final String hiveTableName = context.getProperty(HIVE_TABLE_NAME).isSet() ? context.getProperty(HIVE_TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue() -: NiFiOrcUtils.normalizeHiveTableName(avroSchema.getFullName()); +: NiFiOrcUtils.normalizeHiveTableName(schema.toString());// TODO --- End diff -- I admit I hadn't tested this part, the TODO should be removed but we likely need a way to get at the "name" of the top-level record if the Hive Table Name property is not set. Then again, I haven't seen anyone rely on the schema's full name as the table name, the Hive Table Name property is the recommended way to set this for the generated DDL. Welcome all comments though :) > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.8.0, 1.7.1 > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644216#comment-16644216 ] ASF GitHub Bot commented on NIFI-5667: -- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/3057#discussion_r223892664 --- Diff: nifi-nar-bundles/nifi-hive-bundle/nifi-hive3-processors/src/main/java/org/apache/hadoop/hive/ql/io/orc/NiFiOrcUtils.java --- @@ -163,19 +161,30 @@ public static Object convertToORCObject(TypeInfo typeInfo, Object o, final boole .mapToObj((element) -> convertToORCObject(TypeInfoFactory.getPrimitiveTypeInfo("boolean"), element == 1, hiveFieldNames)) .collect(Collectors.toList()); } -if (o instanceof GenericData.Array) { -GenericData.Array array = ((GenericData.Array) o); -// The type information in this case is interpreted as a List -TypeInfo listTypeInfo = ((ListTypeInfo) typeInfo).getListElementTypeInfo(); -return array.stream().map((element) -> convertToORCObject(listTypeInfo, element, hiveFieldNames)).collect(Collectors.toList()); -} if (o instanceof List) { return o; } +if (o instanceof Record) { --- End diff -- This is actually the part that fixes the nested records issue. The rest is that from the Record, we can only get RecordSchema info, where the original util methods required Avro schema/type info. The other changes are a consequence of this, to replace Avro-specific stuff with NiFi Record API stuff. > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.8.0, 1.7.1 > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644214#comment-16644214 ] ASF GitHub Bot commented on NIFI-5667: -- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/3057#discussion_r223892405 --- Diff: nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-mock-record-utils/src/main/java/org/apache/nifi/serialization/record/MockRecordParser.java --- @@ -58,6 +58,10 @@ public void addSchemaField(final String fieldName, final RecordFieldType type, b fields.add(new RecordField(fieldName, type.getDataType(), isNullable)); } +public void addSchemaField(final RecordField recordField) { --- End diff -- This preserves the full data type of the field. RecordFieldType.getDataType(), which is called from the other add() methods, returns the "base" type, such as "ARRAY" instead of "ARRAY[INT]", for equality purposes. > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.8.0, 1.7.1 > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc", "rb").read()) > writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema) > writer.append({'Serial': {'Serial': 110881615L}}) > writer.close() > #Print whats entered into the avro file > reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader()) > for user in reader1: > print user > {code} > Then just load the avro file using ListFIle -> FetchFile > Full error message: > {code} > 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] > org.apache.nifi.processors.orc.PutORC > PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct: java.lang.IllegalArgumentException: Error converting > object of type org.apache.nifi.serialization.record.MapRecord to ORC type > struct > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type > struct > at > org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71) > at > org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218) > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186) > at > org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662) > at >
[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types
[ https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644209#comment-16644209 ] ASF GitHub Bot commented on NIFI-5667: -- GitHub user mattyb149 opened a pull request: https://github.com/apache/nifi/pull/3057 NIFI-5667: Add nested record support for PutORC The basic approach here is that I removed all references to Avro schemas/fields and replaced them with NiFi Record API concepts. This allows us to not have to switch back and forth, since Avro is not the de facto standard for schemas or flow file content (although we are still fairly tightly coupled to Avro schemas, but that's a different issue :) ) I'll comment in the PR on various parts of the code to note the "real" changes to fix the reported issue. ### For all changes: - [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [x] Does your PR title start with NIFI- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? - [x] Is your initial contribution a single, squashed commit? ### For code changes: - [x] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? - [x] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mattyb149/nifi NIFI-5667 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/3057.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3057 commit ab16d6f77c7dead859643492fe650685ffd7e4ba Author: Matthew Burgess Date: 2018-10-09T22:59:00Z NIFI-5667: Add nested record support for PutORC > Hive3 PutOrc processors, error when using nestled Avro Record types > --- > > Key: NIFI-5667 > URL: https://issues.apache.org/jira/browse/NIFI-5667 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.8.0, 1.7.1 > Environment: Centos 7 and Docker Image from Hortonworks >Reporter: Viking Karstorp >Assignee: Matt Burgess >Priority: Major > > I have been testing out the new PutOrc processor that was introduced in 1.7 > to see if I can replace the ConvertAvroToOrc processer I currently use. > When I sent in some of the complex Avro messages in my flow I encountered the > following error (see full stack further down) > java.lang.IllegalArgumentException: Error converting object of type > org.apache.nifi.serialization.record.MapRecord to ORC type The older > ConvertAvroToOrc processor processed the flowfile without issues. Also to > note is that the PutOrc processor handles the flowfile fine if there is no > Avro data with only the schema present. It seems to be related to nestled > "Record" types. > How to reproduce: > Avro schema: bug.avsc > {code} > { > "name": "nifi_hive3_test", > "namespace": "analytics.models.test", > "type": "record", > "fields": [ >{ > "name": "Serial", > "type": > { > "name": "Serial", > "namespace": "analytics.models.common.serial", > "type": "record", > "fields": [ > { > "name": "Serial", > "type": "long" > } > ] > } > } > ] > } > {code} > Small python script to create an Avro file. > {code} > import avro.schema > from avro.datafile import DataFileReader, DataFileWriter > from avro.io import DatumReader, DatumWriter > schema = avro.schema.parse(open("bug.avsc",