[jira] [Commented] (SPARK-25135) insert datasource table may all null when select from view on parquet

2018-09-02 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601728#comment-16601728
 ] 

Yuming Wang commented on SPARK-25135:
-

[https://github.com/apache/spark/pull/22311]

[https://github.com/apache/spark/pull/22287]

We are trying to fix it.

 

> insert datasource table may all null when select from view on parquet
> -
>
> Key: SPARK-25135
> URL: https://issues.apache.org/jira/browse/SPARK-25135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Yuming Wang
>Priority: Blocker
>  Labels: Parquet, correctness
>
> This happens on parquet.
> How to reproduce in parquet.
> {code:scala}
> val path = "/tmp/spark/parquet"
> val cnt = 30
> spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as bigint) 
> as col2").write.mode("overwrite").parquet(path)
> spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using parquet 
> location '$path'")
> spark.sql("create view view1 as select col1, col2 from table1 where col1 > 
> -20")
> spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using parquet")
> spark.sql("insert overwrite table table2 select COL1, COL2 from view1")
> spark.table("table2").show
> {code}
> FYI, the following is orc.
> {code}
> scala> val path = "/tmp/spark/orc"
> scala> val cnt = 30
> scala> spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as 
> bigint) as col2").write.mode("overwrite").orc(path)
> scala> spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using orc 
> location '$path'")
> scala> spark.sql("create view view1 as select col1, col2 from table1 where 
> col1 > -20")
> scala> spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using orc")
> scala> spark.sql("insert overwrite table table2 select COL1, COL2 from view1")
> scala> spark.table("table2").show
> +++
> |COL1|COL2|
> +++
> |  15|  15|
> |  16|  16|
> |  17|  17|
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25135) insert datasource table may all null when select from view on parquet

2018-08-30 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598111#comment-16598111
 ] 

Saisai Shao commented on SPARK-25135:
-

What's the ETA of this issue [~yumwang]?

> insert datasource table may all null when select from view on parquet
> -
>
> Key: SPARK-25135
> URL: https://issues.apache.org/jira/browse/SPARK-25135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Yuming Wang
>Priority: Blocker
>  Labels: Parquet, correctness
>
> This happens on parquet.
> How to reproduce in parquet.
> {code:scala}
> val path = "/tmp/spark/parquet"
> val cnt = 30
> spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as bigint) 
> as col2").write.mode("overwrite").parquet(path)
> spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using parquet 
> location '$path'")
> spark.sql("create view view1 as select col1, col2 from table1 where col1 > 
> -20")
> spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using parquet")
> spark.sql("insert overwrite table table2 select COL1, COL2 from view1")
> spark.table("table2").show
> {code}
> FYI, the following is orc.
> {code}
> scala> val path = "/tmp/spark/orc"
> scala> val cnt = 30
> scala> spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as 
> bigint) as col2").write.mode("overwrite").orc(path)
> scala> spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using orc 
> location '$path'")
> scala> spark.sql("create view view1 as select col1, col2 from table1 where 
> col1 > -20")
> scala> spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using orc")
> scala> spark.sql("insert overwrite table table2 select COL1, COL2 from view1")
> scala> spark.table("table2").show
> +++
> |COL1|COL2|
> +++
> |  15|  15|
> |  16|  16|
> |  17|  17|
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25135) insert datasource table may all null when select from view on parquet

2018-08-30 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597742#comment-16597742
 ] 

Yuming Wang commented on SPARK-25135:
-

[~dongjoon] orc has this issue. reproduce code:
{code:scala}
withTempPath { dir =>
  val path = dir.getCanonicalPath
  val cnt = 30
  val table1Path = s"$path/table1"
  val table2Path = s"$path/table2"
  val data =
spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id % 3 
as bigint) as col2")
  data.write.mode(SaveMode.Overwrite).orc(table1Path)
  withTable("table1", "table2", "table3") {
spark.sql(
  s"CREATE TABLE table1(col1 bigint, col2 bigint) using orc location 
'$table1Path'")
spark.sql(
  s"CREATE TABLE table2(COL1 bigint, COL2 bigint) using orc location 
'$table2Path'")

withView("view1") {
  spark.sql("CREATE VIEW view1 as select col1, col2 from table1 where 
col1 > -20")
  spark.sql("INSERT OVERWRITE TABLE table2 select COL1, COL2 from 
view1")
  checkAnswer(spark.table("table2"), data)
  assert(spark.read.orc(table2Path).schema === 
spark.table("table2").schema)
}
  }
}
{code}
result should be:

{noformat}
Expected :StructType(StructField(COL1,LongType,true), 
StructField(COL2,LongType,true))
Actual   :StructType(StructField(col1,LongType,true), 
StructField(col2,LongType,true))
{noformat}


> insert datasource table may all null when select from view on parquet
> -
>
> Key: SPARK-25135
> URL: https://issues.apache.org/jira/browse/SPARK-25135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Yuming Wang
>Priority: Blocker
>  Labels: Parquet, correctness
>
> This happens on parquet.
> How to reproduce in parquet.
> {code:scala}
> val path = "/tmp/spark/parquet"
> val cnt = 30
> spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as bigint) 
> as col2").write.mode("overwrite").parquet(path)
> spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using parquet 
> location '$path'")
> spark.sql("create view view1 as select col1, col2 from table1 where col1 > 
> -20")
> spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using parquet")
> spark.sql("insert overwrite table table2 select COL1, COL2 from view1")
> spark.table("table2").show
> {code}
> FYI, the following is orc.
> {code}
> scala> val path = "/tmp/spark/orc"
> scala> val cnt = 30
> scala> spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as 
> bigint) as col2").write.mode("overwrite").orc(path)
> scala> spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using orc 
> location '$path'")
> scala> spark.sql("create view view1 as select col1, col2 from table1 where 
> col1 > -20")
> scala> spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using orc")
> scala> spark.sql("insert overwrite table table2 select COL1, COL2 from view1")
> scala> spark.table("table2").show
> +++
> |COL1|COL2|
> +++
> |  15|  15|
> |  16|  16|
> |  17|  17|
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25135) insert datasource table may all null when select from view on parquet

2018-08-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597741#comment-16597741
 ] 

Apache Spark commented on SPARK-25135:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/22287

> insert datasource table may all null when select from view on parquet
> -
>
> Key: SPARK-25135
> URL: https://issues.apache.org/jira/browse/SPARK-25135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Yuming Wang
>Priority: Blocker
>  Labels: Parquet, correctness
>
> This happens on parquet.
> How to reproduce in parquet.
> {code:scala}
> val path = "/tmp/spark/parquet"
> val cnt = 30
> spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as bigint) 
> as col2").write.mode("overwrite").parquet(path)
> spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using parquet 
> location '$path'")
> spark.sql("create view view1 as select col1, col2 from table1 where col1 > 
> -20")
> spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using parquet")
> spark.sql("insert overwrite table table2 select COL1, COL2 from view1")
> spark.table("table2").show
> {code}
> FYI, the following is orc.
> {code}
> scala> val path = "/tmp/spark/orc"
> scala> val cnt = 30
> scala> spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as 
> bigint) as col2").write.mode("overwrite").orc(path)
> scala> spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using orc 
> location '$path'")
> scala> spark.sql("create view view1 as select col1, col2 from table1 where 
> col1 > -20")
> scala> spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using orc")
> scala> spark.sql("insert overwrite table table2 select COL1, COL2 from view1")
> scala> spark.table("table2").show
> +++
> |COL1|COL2|
> +++
> |  15|  15|
> |  16|  16|
> |  17|  17|
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25135) insert datasource table may all null when select from view on parquet

2018-08-26 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593066#comment-16593066
 ] 

Dongjoon Hyun commented on SPARK-25135:
---

[~yumwang]. Could you update your PR according to this JIRA title? We need to 
be specific.

> insert datasource table may all null when select from view on parquet
> -
>
> Key: SPARK-25135
> URL: https://issues.apache.org/jira/browse/SPARK-25135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Yuming Wang
>Priority: Blocker
>  Labels: correctness
>
> This happens on parquet.
> How to reproduce in parquet.
> {code:scala}
> val path = "/tmp/spark/parquet"
> val cnt = 30
> spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as bigint) 
> as col2").write.mode("overwrite").parquet(path)
> spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using parquet 
> location '$path'")
> spark.sql("create view view1 as select col1, col2 from table1 where col1 > 
> -20")
> spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using parquet")
> spark.sql("insert overwrite table table2 select COL1, COL2 from view1")
> spark.table("table2").show
> {code}
> FYI, the following is orc.
> {code}
> scala> val path = "/tmp/spark/orc"
> scala> val cnt = 30
> scala> spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as 
> bigint) as col2").write.mode("overwrite").orc(path)
> scala> spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using orc 
> location '$path'")
> scala> spark.sql("create view view1 as select col1, col2 from table1 where 
> col1 > -20")
> scala> spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using orc")
> scala> spark.sql("insert overwrite table table2 select COL1, COL2 from view1")
> scala> spark.table("table2").show
> +++
> |COL1|COL2|
> +++
> |  15|  15|
> |  16|  16|
> |  17|  17|
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org