Hi Mike,
I have created the hive external table on the top of parquet and able to read it from hive. While querying hive from spark these are the errors – For the decimal type, I get the following error – (In hive, data type is decimal(12,5)) Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file hdfs://ip-10-0-0-216.ap-south-1.compute.internal:8020/user/hermes/nifi_test1/test_pqt2_dcm/4963966040134. Column: [dc_type], Expected: DecimalType(12,5), Found: BINARY at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) For the time(which was converted into TIME_MILLIS) – (In Hive, the data type is int) Caused by: org.apache.spark.sql.AnalysisException: Parquet type not yet supported: INT32 (TIME_MILLIS); at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotImplemented$1(ParquetSchemaConverter.scala:105) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:141) Thanks, Mohit From: Mike Thomsen <[email protected]> Sent: 30 May 2018 17:28 To: [email protected] Subject: Re: Unable to read the Parquet file written by NiFi through Spark when Logical Data Type is set to true. What's the error from Spark? Logical data types are just a variant on existing data types in Avro 1.8. On Wed, May 30, 2018 at 7:54 AM Mohit <[email protected] <mailto:[email protected]> > wrote: Hi all, I’m fetching the data from RDBMS and writing it to parquet using PutParquet processor. I’m not able to read the data from Spark when Logical Data Type is true. I’m able to read it from Hive. Do I have to set some specific properties in the PutParquet processor to make it readable from spark as well? Regards, Mohit
