Hi Mike,

 

I have created the hive external table on the top of parquet and able to read 
it from hive. 

 

While querying hive from spark these are the errors – 

 

For the decimal type, I get the following error – (In hive, data type is 
decimal(12,5))

Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet 
column cannot be converted in file 
hdfs://ip-10-0-0-216.ap-south-1.compute.internal:8020/user/hermes/nifi_test1/test_pqt2_dcm/4963966040134.
 Column: [dc_type], Expected: DecimalType(12,5), Found: BINARY

  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192)

  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)

 

For the time(which was converted into TIME_MILLIS) –  (In Hive, the data type 
is int)

Caused by: org.apache.spark.sql.AnalysisException: Parquet type not yet 
supported: INT32 (TIME_MILLIS);

at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotImplemented$1(ParquetSchemaConverter.scala:105)

at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:141)

  

Thanks,

Mohit

 

From: Mike Thomsen <[email protected]> 
Sent: 30 May 2018 17:28
To: [email protected]
Subject: Re: Unable to read the Parquet file written by NiFi through Spark when 
Logical Data Type is set to true.

 

What's the error from Spark? Logical data types are just a variant on existing 
data types in Avro 1.8.

 

On Wed, May 30, 2018 at 7:54 AM Mohit <[email protected] 
<mailto:[email protected]> > wrote:

Hi all,

 

I’m fetching the data from RDBMS and writing it to parquet using PutParquet 
processor. I’m not able to read the data from Spark when Logical Data Type is 
true. I’m able to read it from Hive.

Do I have to set some specific properties in the PutParquet processor to make 
it readable from spark as well?

 

Regards,

Mohit

Reply via email to