[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650268#comment-16650268
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/3057


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 1.8.0
>
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
> at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> at 
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650266#comment-16650266
 ] 

ASF subversion and git services commented on NIFI-5667:
---

Commit ce25ae54196a318cbfcdb4dfe178607f4ac135c6 in nifi's branch 
refs/heads/master from [~ca9mbu]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=ce25ae5 ]

NIFI-5667: Add nested record support for PutORC

NIFI-5667: Fixed default table name

NIFI-5667: Fixed handling of binary types

NIFI-5667: Added backticks in Hive DDL generation

This closes #3057.

Signed-off-by: Bryan Bende 


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
> at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> at 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650263#comment-16650263
 ] 

ASF subversion and git services commented on NIFI-5667:
---

Commit ce25ae54196a318cbfcdb4dfe178607f4ac135c6 in nifi's branch 
refs/heads/master from [~ca9mbu]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=ce25ae5 ]

NIFI-5667: Add nested record support for PutORC

NIFI-5667: Fixed default table name

NIFI-5667: Fixed handling of binary types

NIFI-5667: Added backticks in Hive DDL generation

This closes #3057.

Signed-off-by: Bryan Bende 


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
> at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> at 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650264#comment-16650264
 ] 

ASF subversion and git services commented on NIFI-5667:
---

Commit ce25ae54196a318cbfcdb4dfe178607f4ac135c6 in nifi's branch 
refs/heads/master from [~ca9mbu]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=ce25ae5 ]

NIFI-5667: Add nested record support for PutORC

NIFI-5667: Fixed default table name

NIFI-5667: Fixed handling of binary types

NIFI-5667: Added backticks in Hive DDL generation

This closes #3057.

Signed-off-by: Bryan Bende 


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
> at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> at 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650265#comment-16650265
 ] 

ASF subversion and git services commented on NIFI-5667:
---

Commit ce25ae54196a318cbfcdb4dfe178607f4ac135c6 in nifi's branch 
refs/heads/master from [~ca9mbu]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=ce25ae5 ]

NIFI-5667: Add nested record support for PutORC

NIFI-5667: Fixed default table name

NIFI-5667: Fixed handling of binary types

NIFI-5667: Added backticks in Hive DDL generation

This closes #3057.

Signed-off-by: Bryan Bende 


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
> at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> at 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650259#comment-16650259
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user bbende commented on the issue:

https://github.com/apache/nifi/pull/3057
  
Looks good to me, was able to verify the functionality, going to merge


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
> at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> at 
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648425#comment-16648425
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/3057
  
Updated the rest of the backticks, thanks @VikingK and @bbende for your 
reviews!


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
> at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> at 
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647596#comment-16647596
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user VikingK commented on the issue:

https://github.com/apache/nifi/pull/3057
  
@mattyb149 I checked out the new backtick ddl patch you applied. Its seems 
it only backticks the complex data structures, the "top" level ones are for 
example not ticked.
For example ProductId,Items,PropertyMap,Serial and Metadata should be 
backticked as well to protect from bad names.
```
hive.ddl
CREATE EXTERNAL TABLE IF NOT EXISTS Sales (ProductId INT, Items 
ARRAY>, PropertyMap MAP, 
Serial STRUCT<`Serial`:BIGINT, `Date`:TIMESTAMP, `SystemId`:BIGINT, `T`:BIGINT, 
`IncludeStartTx`:BOOLEAN>, Metadata STRUCT<`AvroTs`:TIMESTAMP>) STORED AS ORC
```


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
> at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646982#comment-16646982
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user VikingK commented on the issue:

https://github.com/apache/nifi/pull/3057
  
@mattyb149 asome work, I was gonna suggest the backtick feature since its a 
pain when downstream systems sends all kinds of weird names, like Timestamp 
etc. My solution right now is an ugly groovy script that eliminates them.

I'll run some testing tomorrow on the fixes.


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
> at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> at 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646959#comment-16646959
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/3057
  
I found the issue and pushed a new commit with the fix. Also your test data 
exercised another part of the code I hadn't thought of, your "as" field is a 
keyword in Hive so when I used the generated hive.ddl attribute to create the 
table on top of the ORC files, it didn't work. The other change in this commit 
is to backtick-quote the field names to protect against field names that are 
reserved words.


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
> at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646525#comment-16646525
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user bbende commented on the issue:

https://github.com/apache/nifi/pull/3057
  
Ok using your test.avro going into PutORC with an AvroReader that uses 
embedded schema I can get the error you showed earlier. 

I also tested ConvertRecord using the AvroReader and JsonWrtier and that 
works and produces the following JSON:

```
[
  {
"OItems": [{
  "Od" : 9,
  "HS" : [47,119,61,61],
  "AS" : [65,65,61,61],
  "NS" : "0"
}]
   }
]
```


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
> at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646488#comment-16646488
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user VikingK commented on the issue:

https://github.com/apache/nifi/pull/3057
  
Maybe it's because I am missing a name: on the array level that's causing 
the Jason to pick 'array'. I'll test later tonight.


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
> at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> at 
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646458#comment-16646458
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user VikingK commented on the issue:

https://github.com/apache/nifi/pull/3057
  
@bbende wierd, I tried it again and I got the same error, here is the 
output from avro tools and I also attached my test.avro message

```
java -jar avro-tools-1.8.2.jar getschema test.avro
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for 
more info.
{
  "type" : "record",
  "name" : "IOlist",
  "namespace" : "analytics.models.its",
  "fields" : [ {
"name" : "OItems",
"type" : [ "null", {
  "type" : "array",
  "items" : {
"type" : "record",
"name" : "ISC",
"namespace" : "analytics.models.its.iolist.oitems",
"fields" : [ {
  "name" : "Od",
  "type" : [ "null", "long" ]
}, {
  "name" : "HS",
  "type" : [ "null", "bytes" ]
}, {
  "name" : "AS",
  "type" : [ "null", "bytes" ]
}, {
  "name" : "NS",
  "type" : [ "null", "string" ]
} ]
  }
} ]
  } ]
}

java -jar avro-tools-1.8.2.jar tojson test.avro
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for 
more info.

{"OItems":{"array":[{"Od":{"long":9},"HS":{"bytes":"/w=="},"AS":{"bytes":"AA=="},"NS":{"string":"0"}}]}}
```

[test.zip](https://github.com/apache/nifi/files/2469204/test.zip)


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646429#comment-16646429
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user bbende commented on the issue:

https://github.com/apache/nifi/pull/3057
  
@VikingK in your schema it has OItems defined as an array, but then in the 
JSON OItems is not an array, its an object with a field called array. So 
running with that schema and example JSON I get: 
```

Caused by: java.lang.ClassCastException: 
org.codehaus.jackson.node.ObjectNode cannot be cast to 
org.codehaus.jackson.node.ArrayNode
at 
org.apache.nifi.json.JsonTreeRowRecordReader.convertField(JsonTreeRowRecordReader.java:188)
at 
org.apache.nifi.json.JsonTreeRowRecordReader.convertJsonNodeToRecord(JsonTreeRowRecordReader.java:118)
at 
org.apache.nifi.json.JsonTreeRowRecordReader.convertJsonNodeToRecord(JsonTreeRowRecordReader.java:83)
at 
org.apache.nifi.json.JsonTreeRowRecordReader.convertJsonNodeToRecord(JsonTreeRowRecordReader.java:74)
at 
org.apache.nifi.json.AbstractJsonRowRecordReader.nextRecord(AbstractJsonRowRecordReader.java:92)
```
Which makes sense because the OItems field is not an array, but the schema 
says it is.

I'm trying to figure out how to reproduce the other error you showed.


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645356#comment-16645356
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user VikingK commented on the issue:

https://github.com/apache/nifi/pull/3057
  
Sorry for the late answer got bogged down.

Avro:
```
{
  "name": "IOlist",
  "namespace": "analytics.models.its",
  "type": "record",
  "fields": [
{
  "name": "OItems",
  "type": [
"null",
{
  "type": "array",
  "items": {
"name": "ISC",
"namespace": "analytics.models.its.iolist.oitems",
"type": "record",
"fields": [
  {
"name": "Od",
"type": [
  "null",
  "long"
]
  },
  {
"name": "HS",
"type": [
  "null",
  "bytes"
]
  },
  {
"name": "AS",
"type": [
  "null",
  "bytes"
]
  },
  {
"name": "NS",
"type": [
  "null",
  "string"
]
  }
]
  }
}
  ]
}
  ]
}
```

JSON
```
{
  "OItems" : {
"array" : [ {
  "Od" : {
"long" : 9
  },
  "HS" : {
"bytes" : "/w=="
  },
  "AS" : {
"bytes" : "AA=="
  },
  "NS" : {
"string" : "0"
  }
} ]
  }
}
```


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645145#comment-16645145
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/3057#discussion_r224126176
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive3-processors/src/main/java/org/apache/nifi/processors/orc/PutORC.java
 ---
@@ -157,19 +155,17 @@ public String getDefaultCompressionType(final 
ProcessorInitializationContext con
 public HDFSRecordWriter createHDFSRecordWriter(final ProcessContext 
context, final FlowFile flowFile, final Configuration conf, final Path path, 
final RecordSchema schema)
 throws IOException, SchemaNotFoundException {
 
-final Schema avroSchema = AvroTypeUtil.extractAvroSchema(schema);
-
 final long stripeSize = 
context.getProperty(STRIPE_SIZE).asDataSize(DataUnit.B).longValue();
 final int bufferSize = 
context.getProperty(BUFFER_SIZE).asDataSize(DataUnit.B).intValue();
 final CompressionKind compressionType = 
CompressionKind.valueOf(context.getProperty(COMPRESSION_TYPE).getValue());
 final boolean normalizeForHive = 
context.getProperty(HIVE_FIELD_NAMES).asBoolean();
-TypeInfo orcSchema = NiFiOrcUtils.getOrcField(avroSchema, 
normalizeForHive);
+TypeInfo orcSchema = NiFiOrcUtils.getOrcSchema(schema, 
normalizeForHive);
 final Writer orcWriter = NiFiOrcUtils.createWriter(path, conf, 
orcSchema, stripeSize, compressionType, bufferSize);
 final String hiveTableName = 
context.getProperty(HIVE_TABLE_NAME).isSet()
 ? 
context.getProperty(HIVE_TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue()
-: 
NiFiOrcUtils.normalizeHiveTableName(avroSchema.getFullName());
+: 
NiFiOrcUtils.normalizeHiveTableName(schema.toString());// TODO
--- End diff --

Yep that's not the right thing to put there :) Will investigate getting a 
name from the record somehow, or defaulting to a hardcoded table name if none 
is provided.


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645097#comment-16645097
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/3057
  
Can you share the schema of the data, and possibly a JSON export of your 
Avro file? I couldn't reproduce this with an array of ints, and @bbende ran 
successfully with an array of records.


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.8.0, 1.7.1
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
> at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> at 
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
> at 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644914#comment-16644914
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user VikingK commented on the issue:

https://github.com/apache/nifi/pull/3057
  
Hi, I have been verifying the pull request and I came across this error for 
one of my Avro messages,
I am currently working to figure out which part of the message caused this.

```
2018-10-10 14:14:42,489 ERROR [Timer-Driven Process Thread-6] 
org.apache.nifi.processors.orc.PutORC 
PutORC[id=0430e7ab-99f1-3e25-c491-935245567fa3] Failed to write due to 
java.lang.ClassCa
stException: org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo 
cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo: 
java.lang.ClassCastException: org.apache.hadoop
.hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to 
org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo
java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to 
org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo
at 
org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:127)
at 
org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:177)
at 
org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.lambda$convertToORCObject$0(NiFiOrcUtils.java:129)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at 
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at 
org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:130)
at 
org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:73)
at 
org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:94)
at 
org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
at 
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2235)
at 
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2203)
at 
org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
at 
org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

```


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.8.0, 1.7.1
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644217#comment-16644217
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/3057#discussion_r223893228
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive3-processors/src/main/java/org/apache/nifi/processors/orc/PutORC.java
 ---
@@ -157,19 +155,17 @@ public String getDefaultCompressionType(final 
ProcessorInitializationContext con
 public HDFSRecordWriter createHDFSRecordWriter(final ProcessContext 
context, final FlowFile flowFile, final Configuration conf, final Path path, 
final RecordSchema schema)
 throws IOException, SchemaNotFoundException {
 
-final Schema avroSchema = AvroTypeUtil.extractAvroSchema(schema);
-
 final long stripeSize = 
context.getProperty(STRIPE_SIZE).asDataSize(DataUnit.B).longValue();
 final int bufferSize = 
context.getProperty(BUFFER_SIZE).asDataSize(DataUnit.B).intValue();
 final CompressionKind compressionType = 
CompressionKind.valueOf(context.getProperty(COMPRESSION_TYPE).getValue());
 final boolean normalizeForHive = 
context.getProperty(HIVE_FIELD_NAMES).asBoolean();
-TypeInfo orcSchema = NiFiOrcUtils.getOrcField(avroSchema, 
normalizeForHive);
+TypeInfo orcSchema = NiFiOrcUtils.getOrcSchema(schema, 
normalizeForHive);
 final Writer orcWriter = NiFiOrcUtils.createWriter(path, conf, 
orcSchema, stripeSize, compressionType, bufferSize);
 final String hiveTableName = 
context.getProperty(HIVE_TABLE_NAME).isSet()
 ? 
context.getProperty(HIVE_TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue()
-: 
NiFiOrcUtils.normalizeHiveTableName(avroSchema.getFullName());
+: 
NiFiOrcUtils.normalizeHiveTableName(schema.toString());// TODO
--- End diff --

I admit I hadn't tested this part, the TODO should be removed but we likely 
need a way to get at the "name" of the top-level record if the Hive Table Name 
property is not set. Then again, I haven't seen anyone rely on the schema's 
full name as the table name, the Hive Table Name property is the recommended 
way to set this for the generated DDL. Welcome all comments though :)


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.8.0, 1.7.1
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644216#comment-16644216
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/3057#discussion_r223892664
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive3-processors/src/main/java/org/apache/hadoop/hive/ql/io/orc/NiFiOrcUtils.java
 ---
@@ -163,19 +161,30 @@ public static Object convertToORCObject(TypeInfo 
typeInfo, Object o, final boole
 .mapToObj((element) -> 
convertToORCObject(TypeInfoFactory.getPrimitiveTypeInfo("boolean"), element == 
1, hiveFieldNames))
 .collect(Collectors.toList());
 }
-if (o instanceof GenericData.Array) {
-GenericData.Array array = ((GenericData.Array) o);
-// The type information in this case is interpreted as a 
List
-TypeInfo listTypeInfo = ((ListTypeInfo) 
typeInfo).getListElementTypeInfo();
-return array.stream().map((element) -> 
convertToORCObject(listTypeInfo, element, 
hiveFieldNames)).collect(Collectors.toList());
-}
 if (o instanceof List) {
 return o;
 }
+if (o instanceof Record) {
--- End diff --

This is actually the part that fixes the nested records issue. The rest is 
that from the Record, we can only get RecordSchema info, where the original 
util methods required Avro schema/type info. The other changes are a 
consequence of this, to replace Avro-specific stuff with NiFi Record API stuff.


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.8.0, 1.7.1
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644214#comment-16644214
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/3057#discussion_r223892405
  
--- Diff: 
nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-mock-record-utils/src/main/java/org/apache/nifi/serialization/record/MockRecordParser.java
 ---
@@ -58,6 +58,10 @@ public void addSchemaField(final String fieldName, final 
RecordFieldType type, b
 fields.add(new RecordField(fieldName, type.getDataType(), 
isNullable));
 }
 
+public void addSchemaField(final RecordField recordField) {
--- End diff --

This preserves the full data type of the field. 
RecordFieldType.getDataType(), which is called from the other add() methods, 
returns the "base" type, such as "ARRAY" instead of "ARRAY[INT]", for equality 
purposes.


> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.8.0, 1.7.1
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc", "rb").read())
> writer = DataFileWriter(open("bug.avro", "wb"), DatumWriter(), schema)
> writer.append({'Serial': {'Serial': 110881615L}})
> writer.close()
> #Print whats entered into the avro file
> reader1 = DataFileReader(open("bug.avro", "rb"), DatumReader())
> for user in reader1:
> print user
> {code}
> Then just load the avro file using ListFIle -> FetchFile
> Full error message:
> {code}
> 2018-10-06 15:54:10,201 ERROR [Timer-Driven Process Thread-8] 
> org.apache.nifi.processors.orc.PutORC 
> PutORC[id=8be207cb-b16e-3578-1765-1c9e0c0aa383] Failed to write due to 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct: java.lang.IllegalArgumentException: Error converting 
> object of type org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type 
> struct
> at 
> org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:206)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:71)
> at 
> org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:91)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2218)
> at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2186)
> at 
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
> at 
> 

[jira] [Commented] (NIFI-5667) Hive3 PutOrc processors, error when using nestled Avro Record types

2018-10-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644209#comment-16644209
 ] 

ASF GitHub Bot commented on NIFI-5667:
--

GitHub user mattyb149 opened a pull request:

https://github.com/apache/nifi/pull/3057

NIFI-5667: Add nested record support for PutORC

The basic approach here is that I removed all references to Avro 
schemas/fields and replaced them with NiFi Record API concepts. This allows us 
to not have to switch back and forth, since Avro is not the de facto standard 
for schemas or flow file content (although we are still fairly tightly coupled 
to Avro schemas, but that's a different issue :) )

I'll comment in the PR on various parts of the code to note the "real" 
changes to fix the reported issue.

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [x] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [x] Is your initial contribution a single, squashed commit?

### For code changes:
- [x] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [x] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mattyb149/nifi NIFI-5667

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/3057.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3057


commit ab16d6f77c7dead859643492fe650685ffd7e4ba
Author: Matthew Burgess 
Date:   2018-10-09T22:59:00Z

NIFI-5667: Add nested record support for PutORC




> Hive3 PutOrc processors, error when using nestled Avro Record types
> ---
>
> Key: NIFI-5667
> URL: https://issues.apache.org/jira/browse/NIFI-5667
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.8.0, 1.7.1
> Environment: Centos 7 and Docker Image from Hortonworks
>Reporter: Viking Karstorp
>Assignee: Matt Burgess
>Priority: Major
>
> I have been testing out the new PutOrc processor that was introduced in 1.7 
> to see if I can replace the ConvertAvroToOrc processer I currently use.
> When I sent in some of the complex Avro messages in my flow I encountered the 
> following error (see full stack further down) 
> java.lang.IllegalArgumentException: Error converting object of type 
> org.apache.nifi.serialization.record.MapRecord to ORC type The older 
> ConvertAvroToOrc processor processed the flowfile without issues. Also to 
> note is that the PutOrc processor handles the flowfile fine if there is no 
> Avro data with only the schema present. It seems to be related to nestled 
> "Record" types.
> How to reproduce:
> Avro schema: bug.avsc
> {code}
> {
>   "name": "nifi_hive3_test",
>   "namespace": "analytics.models.test",
>   "type": "record",
>   "fields": [
>{
>   "name": "Serial",
>   "type":
>   {
> "name": "Serial",
> "namespace": "analytics.models.common.serial",
> "type": "record",
> "fields": [
>   {
> "name": "Serial",
> "type": "long"
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Small python script to create an Avro file.
> {code}
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> schema = avro.schema.parse(open("bug.avsc",