Re: Unable to read the Parquet file written by NiFi through Spark when Logical Data Type is set to true.

2018-05-31 Thread Mike Thomsen
Compare the docs for logical types:

https://avro.apache.org/docs/1.7.5/spec.html

https://avro.apache.org/docs/1.7.7/spec.html
https://avro.apache.org/docs/1.8.2/spec.html

It looks like Avro didn't support logical types period in 1.7.5, dipped
their toes into the water on 1.7.7 and then went all in on 1.8.X if the
docs are a good indicator. I think you're going to need to take this to the
Avro list if you want to really get a solid answer beyond that.

On Thu, May 31, 2018 at 6:01 AM Mohit 
wrote:

> Hi,
>
> Spark 2.3 has avro-1.7.7.jar. But hive 1.2 also uses avro-1.7.5.jar and
> I’m able to read the parquet file from hive.
>
> Not sure if this is the reason.
>
>
>
> Thanks,
>
> Mohit
>
>
>
> *From:* Mike Thomsen 
> *Sent:* 31 May 2018 14:15
>
>
> *To:* users@nifi.apache.org
> *Subject:* Re: Unable to read the Parquet file written by NiFi through
> Spark when Logical Data Type is set to true.
>
>
>
> Maybe check to see which version of Avro is bundled with your deployment
> of Spark?
>
>
>
> On Thu, May 31, 2018 at 3:26 AM Mohit 
> wrote:
>
> Hi Mike,
>
>
>
> I have created the hive external table on the top of parquet and able to
> read it from hive.
>
>
>
> While querying hive from spark these are the errors –
>
>
>
> For the decimal type, I get the following error – (In hive, data type is
> decimal(12,5))
>
> Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet
> column cannot be converted in file
> hdfs://ip-10-0-0-216.ap-south-1.compute.internal:8020/user/hermes/nifi_test1/test_pqt2_dcm/4963966040134.
> Column: [dc_type], Expected: DecimalType(12,5), Found: BINARY
>
>   at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192)
>
>   at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
>
>
>
> For the time(which was converted into TIME_MILLIS) –  (In Hive, the data
> type is int)
>
> Caused by: org.apache.spark.sql.AnalysisException: Parquet type not yet
> supported: INT32 (TIME_MILLIS);
>
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotImplemented$1(ParquetSchemaConverter.scala:105)
>
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:141)
>
>
>
> Thanks,
>
> Mohit
>
>
>
> *From:* Mike Thomsen 
> *Sent:* 30 May 2018 17:28
> *To:* users@nifi.apache.org
> *Subject:* Re: Unable to read the Parquet file written by NiFi through
> Spark when Logical Data Type is set to true.
>
>
>
> What's the error from Spark? Logical data types are just a variant on
> existing data types in Avro 1.8.
>
>
>
> On Wed, May 30, 2018 at 7:54 AM Mohit 
> wrote:
>
> Hi all,
>
>
>
> I’m fetching the data from RDBMS and writing it to parquet using
> PutParquet processor. I’m not able to read the data from Spark when Logical
> Data Type is true. I’m able to read it from Hive.
>
> Do I have to set some specific properties in the PutParquet processor to
> make it readable from spark as well?
>
>
>
> Regards,
>
> Mohit
>
>


RE: Unable to read the Parquet file written by NiFi through Spark when Logical Data Type is set to true.

2018-05-31 Thread Mohit
Hi,

Spark 2.3 has avro-1.7.7.jar. But hive 1.2 also uses avro-1.7.5.jar and I’m 
able to read the parquet file from hive.

Not sure if this is the reason.

 

Thanks,

Mohit

 

From: Mike Thomsen  
Sent: 31 May 2018 14:15
To: users@nifi.apache.org
Subject: Re: Unable to read the Parquet file written by NiFi through Spark when 
Logical Data Type is set to true.

 

Maybe check to see which version of Avro is bundled with your deployment of 
Spark?

 

On Thu, May 31, 2018 at 3:26 AM Mohit mailto:mohit.j...@open-insights.co.in> > wrote:

Hi Mike,

 

I have created the hive external table on the top of parquet and able to read 
it from hive. 

 

While querying hive from spark these are the errors – 

 

For the decimal type, I get the following error – (In hive, data type is 
decimal(12,5))

Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet 
column cannot be converted in file 
hdfs://ip-10-0-0-216.ap-south-1.compute.internal:8020/user/hermes/nifi_test1/test_pqt2_dcm/4963966040134.
 Column: [dc_type], Expected: DecimalType(12,5), Found: BINARY

  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192)

  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)

 

For the time(which was converted into TIME_MILLIS) –  (In Hive, the data type 
is int)

Caused by: org.apache.spark.sql.AnalysisException: Parquet type not yet 
supported: INT32 (TIME_MILLIS);

at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotImplemented$1(ParquetSchemaConverter.scala:105)

at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:141)

  

Thanks,

Mohit

 

From: Mike Thomsen mailto:mikerthom...@gmail.com> > 
Sent: 30 May 2018 17:28
To: users@nifi.apache.org  
Subject: Re: Unable to read the Parquet file written by NiFi through Spark when 
Logical Data Type is set to true.

 

What's the error from Spark? Logical data types are just a variant on existing 
data types in Avro 1.8.

 

On Wed, May 30, 2018 at 7:54 AM Mohit mailto:mohit.j...@open-insights.co.in> > wrote:

Hi all,

 

I’m fetching the data from RDBMS and writing it to parquet using PutParquet 
processor. I’m not able to read the data from Spark when Logical Data Type is 
true. I’m able to read it from Hive.

Do I have to set some specific properties in the PutParquet processor to make 
it readable from spark as well?

 

Regards,

Mohit



Re: Unable to read the Parquet file written by NiFi through Spark when Logical Data Type is set to true.

2018-05-31 Thread Mike Thomsen
Maybe check to see which version of Avro is bundled with your deployment of
Spark?

On Thu, May 31, 2018 at 3:26 AM Mohit 
wrote:

> Hi Mike,
>
>
>
> I have created the hive external table on the top of parquet and able to
> read it from hive.
>
>
>
> While querying hive from spark these are the errors –
>
>
>
> For the decimal type, I get the following error – (In hive, data type is
> decimal(12,5))
>
> Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet
> column cannot be converted in file
> hdfs://ip-10-0-0-216.ap-south-1.compute.internal:8020/user/hermes/nifi_test1/test_pqt2_dcm/4963966040134.
> Column: [dc_type], Expected: DecimalType(12,5), Found: BINARY
>
>   at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192)
>
>   at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
>
>
>
> For the time(which was converted into TIME_MILLIS) –  (In Hive, the data
> type is int)
>
> Caused by: org.apache.spark.sql.AnalysisException: Parquet type not yet
> supported: INT32 (TIME_MILLIS);
>
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotImplemented$1(ParquetSchemaConverter.scala:105)
>
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:141)
>
>
>
> Thanks,
>
> Mohit
>
>
>
> *From:* Mike Thomsen 
> *Sent:* 30 May 2018 17:28
> *To:* users@nifi.apache.org
> *Subject:* Re: Unable to read the Parquet file written by NiFi through
> Spark when Logical Data Type is set to true.
>
>
>
> What's the error from Spark? Logical data types are just a variant on
> existing data types in Avro 1.8.
>
>
>
> On Wed, May 30, 2018 at 7:54 AM Mohit 
> wrote:
>
> Hi all,
>
>
>
> I’m fetching the data from RDBMS and writing it to parquet using
> PutParquet processor. I’m not able to read the data from Spark when Logical
> Data Type is true. I’m able to read it from Hive.
>
> Do I have to set some specific properties in the PutParquet processor to
> make it readable from spark as well?
>
>
>
> Regards,
>
> Mohit
>
>


RE: Unable to read the Parquet file written by NiFi through Spark when Logical Data Type is set to true.

2018-05-31 Thread Mohit
Hi Mike,

 

I have created the hive external table on the top of parquet and able to read 
it from hive. 

 

While querying hive from spark these are the errors – 

 

For the decimal type, I get the following error – (In hive, data type is 
decimal(12,5))

Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet 
column cannot be converted in file 
hdfs://ip-10-0-0-216.ap-south-1.compute.internal:8020/user/hermes/nifi_test1/test_pqt2_dcm/4963966040134.
 Column: [dc_type], Expected: DecimalType(12,5), Found: BINARY

  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192)

  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)

 

For the time(which was converted into TIME_MILLIS) –  (In Hive, the data type 
is int)

Caused by: org.apache.spark.sql.AnalysisException: Parquet type not yet 
supported: INT32 (TIME_MILLIS);

at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotImplemented$1(ParquetSchemaConverter.scala:105)

at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:141)

  

Thanks,

Mohit

 

From: Mike Thomsen  
Sent: 30 May 2018 17:28
To: users@nifi.apache.org
Subject: Re: Unable to read the Parquet file written by NiFi through Spark when 
Logical Data Type is set to true.

 

What's the error from Spark? Logical data types are just a variant on existing 
data types in Avro 1.8.

 

On Wed, May 30, 2018 at 7:54 AM Mohit mailto:mohit.j...@open-insights.co.in> > wrote:

Hi all,

 

I’m fetching the data from RDBMS and writing it to parquet using PutParquet 
processor. I’m not able to read the data from Spark when Logical Data Type is 
true. I’m able to read it from Hive.

Do I have to set some specific properties in the PutParquet processor to make 
it readable from spark as well?

 

Regards,

Mohit



Re: Unable to read the Parquet file written by NiFi through Spark when Logical Data Type is set to true.

2018-05-30 Thread Mike Thomsen
What's the error from Spark? Logical data types are just a variant on
existing data types in Avro 1.8.

On Wed, May 30, 2018 at 7:54 AM Mohit 
wrote:

> Hi all,
>
>
>
> I’m fetching the data from RDBMS and writing it to parquet using
> PutParquet processor. I’m not able to read the data from Spark when Logical
> Data Type is true. I’m able to read it from Hive.
>
> Do I have to set some specific properties in the PutParquet processor to
> make it readable from spark as well?
>
>
>
> Regards,
>
> Mohit
>


Unable to read the Parquet file written by NiFi through Spark when Logical Data Type is set to true.

2018-05-30 Thread Mohit
Hi all,

 

I'm fetching the data from RDBMS and writing it to parquet using PutParquet
processor. I'm not able to read the data from Spark when Logical Data Type
is true. I'm able to read it from Hive.

Do I have to set some specific properties in the PutParquet processor to
make it readable from spark as well?

 

Regards,

Mohit