Re: Unable to read the Parquet file written by NiFi through Spark when Logical Data Type is set to true.

Mike Thomsen Thu, 31 May 2018 03:52:52 -0700

Compare the docs for logical types:

https://avro.apache.org/docs/1.7.5/spec.html
<https://avro.apache.org/docs/1.7.7/spec.html>
https://avro.apache.org/docs/1.7.7/spec.html
https://avro.apache.org/docs/1.8.2/spec.html


It looks like Avro didn't support logical types period in 1.7.5, dipped
their toes into the water on 1.7.7 and then went all in on 1.8.X if the
docs are a good indicator. I think you're going to need to take this to the
Avro list if you want to really get a solid answer beyond that.

On Thu, May 31, 2018 at 6:01 AM Mohit <[email protected]>
wrote:

> Hi,
>
> Spark 2.3 has avro-1.7.7.jar. But hive 1.2 also uses avro-1.7.5.jar and
> I’m able to read the parquet file from hive.
>
> Not sure if this is the reason.
>
>
>
> Thanks,
>
> Mohit
>
>
>
> *From:* Mike Thomsen <[email protected]>
> *Sent:* 31 May 2018 14:15
>
>
> *To:* [email protected]
> *Subject:* Re: Unable to read the Parquet file written by NiFi through
> Spark when Logical Data Type is set to true.
>
>
>
> Maybe check to see which version of Avro is bundled with your deployment
> of Spark?
>
>
>
> On Thu, May 31, 2018 at 3:26 AM Mohit <[email protected]>
> wrote:
>
> Hi Mike,
>
>
>
> I have created the hive external table on the top of parquet and able to
> read it from hive.
>
>
>
> While querying hive from spark these are the errors –
>
>
>
> For the decimal type, I get the following error – (In hive, data type is
> decimal(12,5))
>
> Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet
> column cannot be converted in file
> hdfs://ip-10-0-0-216.ap-south-1.compute.internal:8020/user/hermes/nifi_test1/test_pqt2_dcm/4963966040134.
> Column: [dc_type], Expected: DecimalType(12,5), Found: BINARY
>
>   at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192)
>
>   at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
>
>
>
> For the time(which was converted into TIME_MILLIS) –  (In Hive, the data
> type is int)
>
> Caused by: org.apache.spark.sql.AnalysisException: Parquet type not yet
> supported: INT32 (TIME_MILLIS);
>
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotImplemented$1(ParquetSchemaConverter.scala:105)
>
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:141)
>
>
>
> Thanks,
>
> Mohit
>
>
>
> *From:* Mike Thomsen <[email protected]>
> *Sent:* 30 May 2018 17:28
> *To:* [email protected]
> *Subject:* Re: Unable to read the Parquet file written by NiFi through
> Spark when Logical Data Type is set to true.
>
>
>
> What's the error from Spark? Logical data types are just a variant on
> existing data types in Avro 1.8.
>
>
>
> On Wed, May 30, 2018 at 7:54 AM Mohit <[email protected]>
> wrote:
>
> Hi all,
>
>
>
> I’m fetching the data from RDBMS and writing it to parquet using
> PutParquet processor. I’m not able to read the data from Spark when Logical
> Data Type is true. I’m able to read it from Hive.
>
> Do I have to set some specific properties in the PutParquet processor to
> make it readable from spark as well?
>
>
>
> Regards,
>
> Mohit
>
>

Re: Unable to read the Parquet file written by NiFi through Spark when Logical Data Type is set to true.

Reply via email to