[jira] [Commented] (SPARK-26437) Decimal data becomes bigint to query, unable to query

2019-05-09 Thread Xiao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836878#comment-16836878
 ] 

Xiao Li commented on SPARK-26437:
-

Even if we do not use our native ORC reader, Spark 3.0 will be able to read it 
when enabling Hadoop 3.2 profile since we upgrade Hive executive JAR from 1.2.1 
too 2.3.4. See the PR https://github.com/apache/spark/pull/24391

> Decimal data becomes bigint to query, unable to query
> -
>
> Key: SPARK-26437
> URL: https://issues.apache.org/jira/browse/SPARK-26437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.2, 2.3.1
>Reporter: zengxl
>Priority: Major
> Fix For: 3.0.0
>
>
> this is my sql:
> create table tmp.tmp_test_6387_1224_spark  stored  as ORCFile  as select 0.00 
> as a
> select a from tmp.tmp_test_6387_1224_spark
> CREATE TABLE `tmp.tmp_test_6387_1224_spark`(
>  {color:#f79232} `a` decimal(2,2)){color}
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> When I query this table(use hive or sparksql,the exception is same), I throw 
> the following exception information
> *Caused by: java.io.EOFException: Reading BigInteger past EOF from compressed 
> stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 
> limit: 0*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readBigInteger(SerializationUtils.java:176)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$DecimalTreeReader.next(TreeReaderFactory.java:1264)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1039)*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26437) Decimal data becomes bigint to query, unable to query

2019-01-02 Thread zengxl (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732701#comment-16732701
 ] 

zengxl commented on SPARK-26437:


Thanks [~dongjoon]

> Decimal data becomes bigint to query, unable to query
> -
>
> Key: SPARK-26437
> URL: https://issues.apache.org/jira/browse/SPARK-26437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.2, 2.3.1
>Reporter: zengxl
>Priority: Major
> Fix For: 3.0.0
>
>
> this is my sql:
> create table tmp.tmp_test_6387_1224_spark  stored  as ORCFile  as select 0.00 
> as a
> select a from tmp.tmp_test_6387_1224_spark
> CREATE TABLE `tmp.tmp_test_6387_1224_spark`(
>  {color:#f79232} `a` decimal(2,2)){color}
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> When I query this table(use hive or sparksql,the exception is same), I throw 
> the following exception information
> *Caused by: java.io.EOFException: Reading BigInteger past EOF from compressed 
> stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 
> limit: 0*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readBigInteger(SerializationUtils.java:176)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$DecimalTreeReader.next(TreeReaderFactory.java:1264)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1039)*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26437) Decimal data becomes bigint to query, unable to query

2018-12-27 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729945#comment-16729945
 ] 

Dongjoon Hyun commented on SPARK-26437:
---

Hi, [~zengxl].

Thank you for reporting. This is a very old issue since Apache Spark 1.x which 
occurs when you use `decimal`. Please note that `CAST` and `decimal` in the 
following example. Since Spark 2.0, `0.0` literal interpreted as `Decimal`. So, 
you are hitting this issue without casting, too. This is fixed at `master` 
branch and will be released as Apache Spark 3.0.0.

{code}
scala> sc.version
res0: String = 1.6.3

scala> sql("drop table spark_orc")
scala> sql("create table spark_orc stored as orc as select cast(0.00 as 
decimal(2,2)) as a")
scala> sql("select * from spark_orc").show
...
Caused by: java.io.EOFException: Reading BigInteger past EOF from compressed 
stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 
limit: 0
{code}

If you are interested, the followings are the details.

First, the underlying ORC issue (HIVE-13083) is fixed at Hive 1.3.0, but Spark 
is still using embedded Hive 1.2.1. To avoid the underlying ORC issue, you can 
use new ORC data source (`set spark.sql.orc.impl=native`). So, in Spark 2.4.0, 
you can use `USING` syntax to avoid this.

{code}
scala> sql("create table spark_orc using orc as select 0.00 as a")
scala> sql("select * from spark_orc").show
++
|   a|
++
|0.00|
++

scala> spark.version
res2: String = 2.4.0
{code}

Second, SPARK-22977 made a regression on CTAS at Spark 2.3.0 and is fixed 
recently SPARK-25271 (Hive CTAS commands should use data source if it is 
convertible) at Apache Spark 3.0.0. In Spark 3.0.0, you can use `STORED AS ORC` 
syntax without this problem.
{code}
scala> sql("create table spark_orc stored as orc as select 0.00 as a")
scala> sql("select * from spark_orc").show
++
|   a|
++
|0.00|
++

scala> spark.version
res3: String = 3.0.0-SNAPSHOT
{code}

So, I'll close this issue since this is fixed in 3.0.0.

cc [~cloud_fan], [~viirya], [~smilegator], [~hyukjin.kwon]

> Decimal data becomes bigint to query, unable to query
> -
>
> Key: SPARK-26437
> URL: https://issues.apache.org/jira/browse/SPARK-26437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.2, 2.3.1
>Reporter: zengxl
>Priority: Major
>
> this is my sql:
> create table tmp.tmp_test_6387_1224_spark  stored  as ORCFile  as select 0.00 
> as a
> select a from tmp.tmp_test_6387_1224_spark
> CREATE TABLE `tmp.tmp_test_6387_1224_spark`(
>  {color:#f79232} `a` decimal(2,2)){color}
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> When I query this table(use hive or sparksql,the exception is same), I throw 
> the following exception information
> *Caused by: java.io.EOFException: Reading BigInteger past EOF from compressed 
> stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 
> limit: 0*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readBigInteger(SerializationUtils.java:176)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$DecimalTreeReader.next(TreeReaderFactory.java:1264)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1039)*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26437) Decimal data becomes bigint to query, unable to query

2018-12-27 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729925#comment-16729925
 ] 

Dongjoon Hyun commented on SPARK-26437:
---

Thanks, [~mgaido].

> Decimal data becomes bigint to query, unable to query
> -
>
> Key: SPARK-26437
> URL: https://issues.apache.org/jira/browse/SPARK-26437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: zengxl
>Priority: Major
>
> this is my sql:
> create table tmp.tmp_test_6387_1224_spark  stored  as ORCFile  as select 0.00 
> as a
> select a from tmp.tmp_test_6387_1224_spark
> CREATE TABLE `tmp.tmp_test_6387_1224_spark`(
>  {color:#f79232} `a` decimal(2,2)){color}
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> When I query this table(use hive or sparksql,the exception is same), I throw 
> the following exception information
> *Caused by: java.io.EOFException: Reading BigInteger past EOF from compressed 
> stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 
> limit: 0*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readBigInteger(SerializationUtils.java:176)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$DecimalTreeReader.next(TreeReaderFactory.java:1264)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1039)*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26437) Decimal data becomes bigint to query, unable to query

2018-12-27 Thread Marco Gaido (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729526#comment-16729526
 ] 

Marco Gaido commented on SPARK-26437:
-

cc [~dongjoon]

> Decimal data becomes bigint to query, unable to query
> -
>
> Key: SPARK-26437
> URL: https://issues.apache.org/jira/browse/SPARK-26437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: zengxl
>Priority: Major
>
> this is my sql:
> create table tmp.tmp_test_6387_1224_spark  stored  as ORCFile  as select 0.00 
> as a
> select a from tmp.tmp_test_6387_1224_spark
> CREATE TABLE `tmp.tmp_test_6387_1224_spark`(
>  {color:#f79232} `a` decimal(2,2)){color}
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> When I query this table(use hive or sparksql,the exception is same), I throw 
> the following exception information
> *Caused by: java.io.EOFException: Reading BigInteger past EOF from compressed 
> stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 
> limit: 0*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readBigInteger(SerializationUtils.java:176)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$DecimalTreeReader.next(TreeReaderFactory.java:1264)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1039)*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org