[jira] [Resolved] (SPARK-25453) OracleIntegrationSuite IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]

2018-09-30 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-25453.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

> OracleIntegrationSuite IllegalArgumentException: Timestamp format must be 
> -mm-dd hh:mm:ss[.f]
> -
>
> Key: SPARK-25453
> URL: https://issues.apache.org/jira/browse/SPARK-25453
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Assignee: Chenxiao Mao
>Priority: Major
> Fix For: 2.4.0
>
>
> {noformat}
> - SPARK-22814 support date/timestamp types in partitionColumn *** FAILED ***
>   java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
> hh:mm:ss[.f]
>   at java.sql.Timestamp.valueOf(Timestamp.java:204)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:183)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
>   at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
>   at 
> org.apache.spark.sql.jdbc.OracleIntegrationSuite$$anonfun$18.apply(OracleIntegrationSuite.scala:445)
>   at 
> org.apache.spark.sql.jdbc.OracleIntegrationSuite$$anonfun$18.apply(OracleIntegrationSuite.scala:427)
>   ...{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25453) OracleIntegrationSuite IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]

2018-09-30 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-25453:
---

Assignee: Chenxiao Mao

> OracleIntegrationSuite IllegalArgumentException: Timestamp format must be 
> -mm-dd hh:mm:ss[.f]
> -
>
> Key: SPARK-25453
> URL: https://issues.apache.org/jira/browse/SPARK-25453
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Assignee: Chenxiao Mao
>Priority: Major
>
> {noformat}
> - SPARK-22814 support date/timestamp types in partitionColumn *** FAILED ***
>   java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
> hh:mm:ss[.f]
>   at java.sql.Timestamp.valueOf(Timestamp.java:204)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:183)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
>   at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
>   at 
> org.apache.spark.sql.jdbc.OracleIntegrationSuite$$anonfun$18.apply(OracleIntegrationSuite.scala:445)
>   at 
> org.apache.spark.sql.jdbc.OracleIntegrationSuite$$anonfun$18.apply(OracleIntegrationSuite.scala:427)
>   ...{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25579) Use quoted attribute names if needed in pushed ORC predicates

2018-09-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25579:


Assignee: Apache Spark  (was: Dongjoon Hyun)

> Use quoted attribute names if needed in pushed ORC predicates
> -
>
> Key: SPARK-25579
> URL: https://issues.apache.org/jira/browse/SPARK-25579
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Critical
>
> This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
> Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.
> *Test Data*
> {code:java}
> scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
> scala> df.write.mode("overwrite").orc("/tmp/orc")
> {code}
> *Spark 2.3.2*
> {code:java}
> scala> spark.sql("set spark.sql.orc.impl=native")
> scala> spark.sql("set spark.sql.orc.filterPushdown=true")
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 1486 ms
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 163 ms
> {code}
> *Spark 2.4.0 RC2*
> {code:java}
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 4087 ms
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 1998 ms
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25579) Use quoted attribute names if needed in pushed ORC predicates

2018-09-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633612#comment-16633612
 ] 

Apache Spark commented on SPARK-25579:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/22597

> Use quoted attribute names if needed in pushed ORC predicates
> -
>
> Key: SPARK-25579
> URL: https://issues.apache.org/jira/browse/SPARK-25579
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Critical
>
> This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
> Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.
> *Test Data*
> {code:java}
> scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
> scala> df.write.mode("overwrite").orc("/tmp/orc")
> {code}
> *Spark 2.3.2*
> {code:java}
> scala> spark.sql("set spark.sql.orc.impl=native")
> scala> spark.sql("set spark.sql.orc.filterPushdown=true")
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 1486 ms
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 163 ms
> {code}
> *Spark 2.4.0 RC2*
> {code:java}
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 4087 ms
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 1998 ms
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25579) Use quoted attribute names if needed in pushed ORC predicates

2018-09-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25579:


Assignee: Dongjoon Hyun  (was: Apache Spark)

> Use quoted attribute names if needed in pushed ORC predicates
> -
>
> Key: SPARK-25579
> URL: https://issues.apache.org/jira/browse/SPARK-25579
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Critical
>
> This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
> Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.
> *Test Data*
> {code:java}
> scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
> scala> df.write.mode("overwrite").orc("/tmp/orc")
> {code}
> *Spark 2.3.2*
> {code:java}
> scala> spark.sql("set spark.sql.orc.impl=native")
> scala> spark.sql("set spark.sql.orc.filterPushdown=true")
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 1486 ms
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 163 ms
> {code}
> *Spark 2.4.0 RC2*
> {code:java}
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 4087 ms
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 1998 ms
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25579) Use quoted attribute names if needed in pushed ORC predicates

2018-09-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633613#comment-16633613
 ] 

Apache Spark commented on SPARK-25579:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/22597

> Use quoted attribute names if needed in pushed ORC predicates
> -
>
> Key: SPARK-25579
> URL: https://issues.apache.org/jira/browse/SPARK-25579
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Critical
>
> This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
> Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.
> *Test Data*
> {code:java}
> scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
> scala> df.write.mode("overwrite").orc("/tmp/orc")
> {code}
> *Spark 2.3.2*
> {code:java}
> scala> spark.sql("set spark.sql.orc.impl=native")
> scala> spark.sql("set spark.sql.orc.filterPushdown=true")
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 1486 ms
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 163 ms
> {code}
> *Spark 2.4.0 RC2*
> {code:java}
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 4087 ms
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 1998 ms
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25579) Use quoted attribute names if needed in pushed ORC predicates

2018-09-30 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25579:
--
Description: 
This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.

*Test Data*
{code:java}
scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
scala> df.write.mode("overwrite").orc("/tmp/orc")
{code}
*Spark 2.3.2*
{code:java}
scala> spark.sql("set spark.sql.orc.impl=native")
scala> spark.sql("set spark.sql.orc.filterPushdown=true")
scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 10").show)
++
|col.with.dot|
++
|   1|
|   8|
++

Time taken: 1486 ms

scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 10").show)
++
|col.with.dot|
++
|   1|
|   8|
++

Time taken: 163 ms
{code}
*Spark 2.4.0 RC2*
{code:java}
scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 10").show)
++
|col.with.dot|
++
|   1|
|   8|
++

Time taken: 4087 ms

scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 10").show)
++
|col.with.dot|
++
|   1|
|   8|
++

Time taken: 1998 ms
{code}

  was:
This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.

*Spark 2.3.2*
{code:java}
scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
scala> df.write.mode("overwrite").orc("/tmp/orc")
scala> spark.sql("set spark.sql.orc.impl=native")
scala> spark.sql("set spark.sql.orc.filterPushdown=true")

scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
49995").show)
++
|col.with.dot|
++
|   49995|
++

Time taken: 1509 ms

scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
49995").show)
++
|col.with.dot|
++
|   49995|
++

Time taken: 164 ms

scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
49995").show)
++
|col.with.dot|
++
|   49995|
++

Time taken: 140 ms
{code}
*Spark 2.4.0 RC2*
{code:java}
scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
49995").show)
++
|col.with.dot|
++
|   49995|
++

Time taken: 4257 ms

scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
49995").show)
++
|col.with.dot|
++
|   49995|
++

Time taken: 2246 ms

scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
49995").show)
++
|col.with.dot|
++
|   49995|
++

Time taken: 2472 ms{code}


> Use quoted attribute names if needed in pushed ORC predicates
> -
>
> Key: SPARK-25579
> URL: https://issues.apache.org/jira/browse/SPARK-25579
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Critical
>
> This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
> Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.
> *Test Data*
> {code:java}
> scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
> scala> df.write.mode("overwrite").orc("/tmp/orc")
> {code}
> *Spark 2.3.2*
> {code:java}
> scala> spark.sql("set spark.sql.orc.impl=native")
> scala> spark.sql("set spark.sql.orc.filterPushdown=true")
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 1486 ms
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 163 ms
> {code}
> *Spark 2.4.0 RC2*
> {code:java}
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 4087 ms
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < 
> 10").show)
> ++
> |col.with.dot|
> ++
> |   1|
> |   8|
> ++
> Time taken: 1998 ms
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Updated] (SPARK-25579) Use quoted attribute names if needed in pushed ORC predicates

2018-09-30 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25579:
--
Description: 
This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.

*Spark 2.3.2*
{code:java}
scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
scala> df.write.mode("overwrite").orc("/tmp/orc")
scala> spark.sql("set spark.sql.orc.impl=native")
scala> spark.sql("set spark.sql.orc.filterPushdown=true")

scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
49995").show)
++
|col.with.dot|
++
|   49995|
++

Time taken: 1509 ms

scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
49995").show)
++
|col.with.dot|
++
|   49995|
++

Time taken: 164 ms

scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
49995").show)
++
|col.with.dot|
++
|   49995|
++

Time taken: 140 ms
{code}
*Spark 2.4.0 RC2*
{code:java}
scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
49995").show)
++
|col.with.dot|
++
|   49995|
++

Time taken: 4257 ms

scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
49995").show)
++
|col.with.dot|
++
|   49995|
++

Time taken: 2246 ms

scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
49995").show)
++
|col.with.dot|
++
|   49995|
++

Time taken: 2472 ms{code}

  was:
This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.

*Spark 2.3.2*
{code:java}
scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
scala> df.write.mode("overwrite").orc("/tmp/orc")
scala> spark.sql("set spark.sql.orc.impl=native")
scala> spark.sql("set spark.sql.orc.filterPushdown=true")
scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
5").count)
Time taken: 803 ms
{code}
*Spark 2.4.0 RC2*
{code:java}
scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
5").count)
Time taken: 2405 ms{code}


> Use quoted attribute names if needed in pushed ORC predicates
> -
>
> Key: SPARK-25579
> URL: https://issues.apache.org/jira/browse/SPARK-25579
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Critical
>
> This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
> Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.
> *Spark 2.3.2*
> {code:java}
> scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
> scala> df.write.mode("overwrite").orc("/tmp/orc")
> scala> spark.sql("set spark.sql.orc.impl=native")
> scala> spark.sql("set spark.sql.orc.filterPushdown=true")
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
> 49995").show)
> ++
> |col.with.dot|
> ++
> |   49995|
> ++
> Time taken: 1509 ms
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
> 49995").show)
> ++
> |col.with.dot|
> ++
> |   49995|
> ++
> Time taken: 164 ms
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
> 49995").show)
> ++
> |col.with.dot|
> ++
> |   49995|
> ++
> Time taken: 140 ms
> {code}
> *Spark 2.4.0 RC2*
> {code:java}
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
> 49995").show)
> ++
> |col.with.dot|
> ++
> |   49995|
> ++
> Time taken: 4257 ms
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
> 49995").show)
> ++
> |col.with.dot|
> ++
> |   49995|
> ++
> Time taken: 2246 ms
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
> 49995").show)
> ++
> |col.with.dot|
> ++
> |   49995|
> ++
> Time taken: 2472 ms{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25579) Use quoted attribute names if needed in pushed ORC predicates

2018-09-30 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25579:
--
Description: 
This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.

*Spark 2.3.2*
{code:java}
scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
scala> df.write.mode("overwrite").orc("/tmp/orc")
scala> spark.sql("set spark.sql.orc.impl=native")
scala> spark.sql("set spark.sql.orc.filterPushdown=true")
scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
5").count)
Time taken: 803 ms
{code}
*Spark 2.4.0 RC2*
{code:java}
scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
5").count)
Time taken: 2405 ms{code}

  was:
This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.

*Spark 2.3.2*
{code:java}
scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
scala> df.write.mode("overwrite").orc("/tmp/orc")
scala> df.write.mode("overwrite").parquet("/tmp/parquet")
scala> spark.sql("set spark.sql.orc.impl=native")
scala> spark.sql("set spark.sql.orc.filterPushdown=true")
scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
5").count)
Time taken: 803 ms

scala> spark.time(spark.read.parquet("/tmp/parquet").where("`col.with.dot` = 
5").count)
Time taken: 5573 ms
{code}
*Spark 2.4.0 RC2*
{code:java}
scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
5").count)
Time taken: 2405 ms{code}


> Use quoted attribute names if needed in pushed ORC predicates
> -
>
> Key: SPARK-25579
> URL: https://issues.apache.org/jira/browse/SPARK-25579
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Critical
>
> This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
> Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.
> *Spark 2.3.2*
> {code:java}
> scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
> scala> df.write.mode("overwrite").orc("/tmp/orc")
> scala> spark.sql("set spark.sql.orc.impl=native")
> scala> spark.sql("set spark.sql.orc.filterPushdown=true")
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
> 5").count)
> Time taken: 803 ms
> {code}
> *Spark 2.4.0 RC2*
> {code:java}
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
> 5").count)
> Time taken: 2405 ms{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25579) Use quoted attribute names if needed in pushed ORC predicates

2018-09-30 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25579:
--
Priority: Critical  (was: Major)

> Use quoted attribute names if needed in pushed ORC predicates
> -
>
> Key: SPARK-25579
> URL: https://issues.apache.org/jira/browse/SPARK-25579
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Critical
>
> This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
> Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.
> *Spark 2.3.2*
> {code:java}
> scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
> scala> df.write.mode("overwrite").orc("/tmp/orc")
> scala> df.write.mode("overwrite").parquet("/tmp/parquet")
> scala> spark.sql("set spark.sql.orc.impl=native")
> scala> spark.sql("set spark.sql.orc.filterPushdown=true")
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
> 5").count)
> Time taken: 803 ms
> scala> spark.time(spark.read.parquet("/tmp/parquet").where("`col.with.dot` = 
> 5").count)
> Time taken: 5573 ms
> {code}
> *Spark 2.4.0 RC2*
> {code:java}
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
> 5").count)
> Time taken: 2405 ms{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25579) Use quoted attribute names if needed in pushed ORC predicates

2018-09-30 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-25579:
-

 Summary: Use quoted attribute names if needed in pushed ORC 
predicates
 Key: SPARK-25579
 URL: https://issues.apache.org/jira/browse/SPARK-25579
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Dongjoon Hyun


This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.

*Spark 2.3.2*
{code:java}
scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
scala> df.write.mode("overwrite").orc("/tmp/orc")
scala> df.write.mode("overwrite").parquet("/tmp/parquet")
scala> spark.sql("set spark.sql.orc.impl=native")
scala> spark.sql("set spark.sql.orc.filterPushdown=true")
scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
5").count)
Time taken: 803 ms

scala> spark.time(spark.read.parquet("/tmp/parquet").where("`col.with.dot` = 
5").count)
Time taken: 5573 ms
{code}
*Spark 2.4.0 RC2*
{code:java}
scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
5").count)
Time taken: 2405 ms{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25579) Use quoted attribute names if needed in pushed ORC predicates

2018-09-30 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-25579:
-

Assignee: Dongjoon Hyun

> Use quoted attribute names if needed in pushed ORC predicates
> -
>
> Key: SPARK-25579
> URL: https://issues.apache.org/jira/browse/SPARK-25579
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from 
> Spark 2.3.2. For column names with `.`, the pushed predicates are ignored.
> *Spark 2.3.2*
> {code:java}
> scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot")
> scala> df.write.mode("overwrite").orc("/tmp/orc")
> scala> df.write.mode("overwrite").parquet("/tmp/parquet")
> scala> spark.sql("set spark.sql.orc.impl=native")
> scala> spark.sql("set spark.sql.orc.filterPushdown=true")
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
> 5").count)
> Time taken: 803 ms
> scala> spark.time(spark.read.parquet("/tmp/parquet").where("`col.with.dot` = 
> 5").count)
> Time taken: 5573 ms
> {code}
> *Spark 2.4.0 RC2*
> {code:java}
> scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 
> 5").count)
> Time taken: 2405 ms{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25563) Spark application hangs If container allocate on lost Nodemanager

2018-09-30 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633575#comment-16633575
 ] 

Hyukjin Kwon commented on SPARK-25563:
--

Please avoid to set the target version which is usually reserved for committers.

> Spark application hangs If container allocate on lost Nodemanager
> -
>
> Key: SPARK-25563
> URL: https://issues.apache.org/jira/browse/SPARK-25563
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: devinduan
>Priority: Minor
>
>     I met a issue that if  I start a spark application use yarn client mode, 
> application sometimes hang.
>     I check the application logs,  container allocate on a lost NodeManager, 
> but AM don't retry to start another executor.
>     My spark version is 2.3.1
>     Here is my ApplicationMaster log.
>  
> 2018-09-26 05:21:15 INFO YarnRMClient:54 - Registering the ApplicationMaster
> 2018-09-26 05:21:15 INFO ConfiguredRMFailoverProxyProvider:100 - Failing over 
> to rm2 
> 2018-09-26 05:21:15 WARN Utils:66 - spark.executor.instances less than 
> spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please 
> update your configs.
> 2018-09-26 05:21:15 INFO Utils:54 - Using initial executors = 1, max of 
> spark.dynamicAllocation.initialExecutors, 
> spark.dynamicAllocation.minExecutors and spark.executor.instances
> 2018-09-26 05:21:15 INFO YarnAllocator:54 - Will request 1 executor 
> container(s), each with 24 core(s) and 20275 MB memory (including 1843 MB of 
> overhead)
> 2018-09-26 05:21:15 INFO YarnAllocator:54 - Submitted 1 unlocalized container 
> requests.
> 2018-09-26 05:21:15 INFO ApplicationMaster:54 - Started progress reporter 
> thread with (heartbeat : 3000, initial allocation : 200) intervals
> 2018-09-26 05:21:27 WARN YarnAllocator:66 - Cannot find executorId for 
> container: container_1532951609168_4721728_01_02
> 2018-09-26 05:21:27 INFO YarnAllocator:54 - Completed container 
> container_1532951609168_4721728_01_02 (state: COMPLETE, exit status: -100)
> 2018-09-26 05:21:27 WARN YarnAllocator:66 - Container marked as failed: 
> container_1532951609168_4721728_01_02. Exit status: -100. Diagnostics: 
> Container released on a *lost* node



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25563) Spark application hangs If container allocate on lost Nodemanager

2018-09-30 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-25563:
-
Target Version/s:   (was: 2.3.1)

> Spark application hangs If container allocate on lost Nodemanager
> -
>
> Key: SPARK-25563
> URL: https://issues.apache.org/jira/browse/SPARK-25563
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: devinduan
>Priority: Minor
>
>     I met a issue that if  I start a spark application use yarn client mode, 
> application sometimes hang.
>     I check the application logs,  container allocate on a lost NodeManager, 
> but AM don't retry to start another executor.
>     My spark version is 2.3.1
>     Here is my ApplicationMaster log.
>  
> 2018-09-26 05:21:15 INFO YarnRMClient:54 - Registering the ApplicationMaster
> 2018-09-26 05:21:15 INFO ConfiguredRMFailoverProxyProvider:100 - Failing over 
> to rm2 
> 2018-09-26 05:21:15 WARN Utils:66 - spark.executor.instances less than 
> spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please 
> update your configs.
> 2018-09-26 05:21:15 INFO Utils:54 - Using initial executors = 1, max of 
> spark.dynamicAllocation.initialExecutors, 
> spark.dynamicAllocation.minExecutors and spark.executor.instances
> 2018-09-26 05:21:15 INFO YarnAllocator:54 - Will request 1 executor 
> container(s), each with 24 core(s) and 20275 MB memory (including 1843 MB of 
> overhead)
> 2018-09-26 05:21:15 INFO YarnAllocator:54 - Submitted 1 unlocalized container 
> requests.
> 2018-09-26 05:21:15 INFO ApplicationMaster:54 - Started progress reporter 
> thread with (heartbeat : 3000, initial allocation : 200) intervals
> 2018-09-26 05:21:27 WARN YarnAllocator:66 - Cannot find executorId for 
> container: container_1532951609168_4721728_01_02
> 2018-09-26 05:21:27 INFO YarnAllocator:54 - Completed container 
> container_1532951609168_4721728_01_02 (state: COMPLETE, exit status: -100)
> 2018-09-26 05:21:27 WARN YarnAllocator:66 - Container marked as failed: 
> container_1532951609168_4721728_01_02. Exit status: -100. Diagnostics: 
> Container released on a *lost* node



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25538) incorrect row counts after distinct()

2018-09-30 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633568#comment-16633568
 ] 

Kazuaki Ishizaki commented on SPARK-25538:
--

Thank you. I will check it tonight in Japan.

> incorrect row counts after distinct()
> -
>
> Key: SPARK-25538
> URL: https://issues.apache.org/jira/browse/SPARK-25538
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
> Environment: Reproduced on a Centos7 VM and from source in Intellij 
> on OS X.
>Reporter: Steven Rand
>Priority: Major
>  Labels: correctness
> Attachments: SPARK-25538-repro.tgz
>
>
> It appears that {{df.distinct.count}} can return incorrect values after 
> SPARK-23713. It's possible that other operations are affected as well; 
> {{distinct}} just happens to be the one that we noticed. I believe that this 
> issue was introduced by SPARK-23713 because I can't reproduce it until that 
> commit, and I've been able to reproduce it after that commit as well as with 
> {{tags/v2.4.0-rc1}}. 
> Below are example spark-shell sessions to illustrate the problem. 
> Unfortunately the data used in these examples can't be uploaded to this Jira 
> ticket. I'll try to create test data which also reproduces the issue, and 
> will upload that if I'm able to do so.
> Example from Spark 2.3.1, which behaves correctly:
> {code}
> scala> val df = spark.read.parquet("hdfs:///data")
> df: org.apache.spark.sql.DataFrame = []
> scala> df.count
> res0: Long = 123
> scala> df.distinct.count
> res1: Long = 115
> {code}
> Example from Spark 2.4.0-rc1, which returns different output:
> {code}
> scala> val df = spark.read.parquet("hdfs:///data")
> df: org.apache.spark.sql.DataFrame = []
> scala> df.count
> res0: Long = 123
> scala> df.distinct.count
> res1: Long = 116
> scala> df.sort("col_0").distinct.count
> res2: Long = 123
> scala> df.withColumnRenamed("col_0", "newName").distinct.count
> res3: Long = 115
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25554) Avro logical types get ignored in SchemaConverters.toSqlType

2018-09-30 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-25554.
--
Resolution: Invalid

> Avro logical types get ignored in SchemaConverters.toSqlType
> 
>
> Key: SPARK-25554
> URL: https://issues.apache.org/jira/browse/SPARK-25554
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: Below is the maven dependencies:
> {code:java}
> 
> org.apache.avro
> avro
> 1.8.2
> 
> 
> com.databricks
> spark-avro_2.11
> 4.0.0
> 
> 
> 
> org.apache.spark
> spark-core_2.11
> 2.3.0
> 
> 
> org.apache.spark
> spark-sql_2.11
> 2.3.0
> 
> {code}
>Reporter: Yanan Li
>Priority: Major
>
> Having Avro schema defined as follow:
> {code:java}
> {
>"namespace": "com.xxx.avro",
>"name": "Book",
>"type": "record",
>"fields": [{
>  "name": "name",
>  "type": ["null", "string"],
>  "default": null
>   }, {
>  "name": "author",
>  "type": ["null", "string"],
>  "default": null
>   }, {
>  "name": "published_date",
>  "type": ["null", {"type": "int", "logicalType": "date"}],
>  "default": null
>   }
>]
> }
> {code}
> Spark Schema converted from above Avro schema, logical type "date" gets 
> ignored.
> {code:java}
> StructType(StructField(name,StringType,true),StructField(author,StringType,true),StructField(published_date,IntegerType,true))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25543) Confusing log messages at DEBUG level, in K8s mode.

2018-09-30 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25543:
--
Affects Version/s: (was: 2.5.0)
   2.4.0

> Confusing log messages at DEBUG level, in K8s mode.
> ---
>
> Key: SPARK-25543
> URL: https://issues.apache.org/jira/browse/SPARK-25543
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Minor
> Fix For: 2.4.1, 2.5.0
>
>
> Steps to reproduce.
> Start spark shell by providing a K8s master. Then turn the debug log on, 
> {code}
> scala> sc.setLogLevel("DEBUG")
> {code}
> {code}
> sc.setLogLevel("DEBUG")
> scala> 2018-09-26 09:33:54 DEBUG ExecutorPodsLifecycleManager:58 - Removed 
> executors with ids  from Spark that were either found to be deleted or 
> non-existent in the cluster.
> 2018-09-26 09:33:55 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:33:56 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:33:56 DEBUG ExecutorPodsPollingSnapshotSource:58 - 
> Resynchronizing full executor pod state from Kubernetes.
> 2018-09-26 09:33:57 DEBUG ExecutorPodsAllocator:58 - Currently have 1 running 
> executors and 0 pending executors. Map() executors have been requested but 
> are pending appearance in the cluster.
> 2018-09-26 09:33:57 DEBUG ExecutorPodsAllocator:58 - Current number of 
> running executors is equal to the number of requested executors. Not scaling 
> up further.
> 2018-09-26 09:33:57 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:33:58 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:33:59 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:00 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:01 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:02 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:03 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:04 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:05 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:06 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from ...
> {code}
> The fix is easy, first check if there are any removed executors, before 
> producing the log message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25543) Confusing log messages at DEBUG level, in K8s mode.

2018-09-30 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-25543:
-

Assignee: Prashant Sharma

> Confusing log messages at DEBUG level, in K8s mode.
> ---
>
> Key: SPARK-25543
> URL: https://issues.apache.org/jira/browse/SPARK-25543
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.5.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Minor
> Fix For: 2.4.1, 2.5.0
>
>
> Steps to reproduce.
> Start spark shell by providing a K8s master. Then turn the debug log on, 
> {code}
> scala> sc.setLogLevel("DEBUG")
> {code}
> {code}
> sc.setLogLevel("DEBUG")
> scala> 2018-09-26 09:33:54 DEBUG ExecutorPodsLifecycleManager:58 - Removed 
> executors with ids  from Spark that were either found to be deleted or 
> non-existent in the cluster.
> 2018-09-26 09:33:55 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:33:56 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:33:56 DEBUG ExecutorPodsPollingSnapshotSource:58 - 
> Resynchronizing full executor pod state from Kubernetes.
> 2018-09-26 09:33:57 DEBUG ExecutorPodsAllocator:58 - Currently have 1 running 
> executors and 0 pending executors. Map() executors have been requested but 
> are pending appearance in the cluster.
> 2018-09-26 09:33:57 DEBUG ExecutorPodsAllocator:58 - Current number of 
> running executors is equal to the number of requested executors. Not scaling 
> up further.
> 2018-09-26 09:33:57 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:33:58 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:33:59 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:00 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:01 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:02 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:03 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:04 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:05 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:06 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from ...
> {code}
> The fix is easy, first check if there are any removed executors, before 
> producing the log message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25543) Confusing log messages at DEBUG level, in K8s mode.

2018-09-30 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-25543.
---
   Resolution: Fixed
Fix Version/s: 2.4.1
   2.5.0

Issue resolved by pull request 22565
[https://github.com/apache/spark/pull/22565]

> Confusing log messages at DEBUG level, in K8s mode.
> ---
>
> Key: SPARK-25543
> URL: https://issues.apache.org/jira/browse/SPARK-25543
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.5.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Minor
> Fix For: 2.5.0, 2.4.1
>
>
> Steps to reproduce.
> Start spark shell by providing a K8s master. Then turn the debug log on, 
> {code}
> scala> sc.setLogLevel("DEBUG")
> {code}
> {code}
> sc.setLogLevel("DEBUG")
> scala> 2018-09-26 09:33:54 DEBUG ExecutorPodsLifecycleManager:58 - Removed 
> executors with ids  from Spark that were either found to be deleted or 
> non-existent in the cluster.
> 2018-09-26 09:33:55 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:33:56 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:33:56 DEBUG ExecutorPodsPollingSnapshotSource:58 - 
> Resynchronizing full executor pod state from Kubernetes.
> 2018-09-26 09:33:57 DEBUG ExecutorPodsAllocator:58 - Currently have 1 running 
> executors and 0 pending executors. Map() executors have been requested but 
> are pending appearance in the cluster.
> 2018-09-26 09:33:57 DEBUG ExecutorPodsAllocator:58 - Current number of 
> running executors is equal to the number of requested executors. Not scaling 
> up further.
> 2018-09-26 09:33:57 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:33:58 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:33:59 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:00 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:01 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:02 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:03 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:04 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:05 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from Spark that were either found to be deleted or non-existent in 
> the cluster.
> 2018-09-26 09:34:06 DEBUG ExecutorPodsLifecycleManager:58 - Removed executors 
> with ids  from ...
> {code}
> The fix is easy, first check if there are any removed executors, before 
> producing the log message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25578) Update to Scala 2.12.7

2018-09-30 Thread Sean Owen (JIRA)
Sean Owen created SPARK-25578:
-

 Summary: Update to Scala 2.12.7
 Key: SPARK-25578
 URL: https://issues.apache.org/jira/browse/SPARK-25578
 Project: Spark
  Issue Type: Improvement
  Components: Build, Spark Core, SQL
Affects Versions: 2.4.0
Reporter: Sean Owen


We should use Scala 2.12.7 over 2.12.6 now, to pick up this fix. We ought to be 
able to back out a workaround in Spark if so.

[https://github.com/scala/scala/releases/tag/v2.12.7]

[https://github.com/scala/scala/pull/7156] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API

2018-09-30 Thread Edwina Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edwina Lu updated SPARK-23429:
--
Fix Version/s: 3.0.0

> Add executor memory metrics to heartbeat and expose in executors REST API
> -
>
> Key: SPARK-23429
> URL: https://issues.apache.org/jira/browse/SPARK-23429
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Fix For: 3.0.0
>
>
> Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, 
> onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the 
> executors REST API. This information will help provide insight into how 
> executor and driver JVM memory is used, and for the different memory regions. 
> It can be used to help determine good values for spark.executor.memory, 
> spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction.
> Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This 
> will track the memory usage at the executor level. The new ExecutorMetrics 
> will be sent by executors to the driver as part of the Heartbeat. A heartbeat 
> will be added for the driver as well, to collect these metrics for the driver.
> Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there 
> is a new peak value for one of the memory metrics for an executor and stage. 
> Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
> additional logging. Analysis on a set of sample applications showed an 
> increase of 0.25% in the size of the Spark history log, with this approach.
> Modify the AppStatusListener to collect snapshots of peak values for each 
> memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
> storageMemory, and list of active stages.
> Add the new memory metrics (snapshots of peak values for each memory metric) 
> to the executors REST API.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25577) Add an on-off switch to display the executor additional columns

2018-09-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1669#comment-1669
 ] 

Apache Spark commented on SPARK-25577:
--

User 'LantaoJin' has created a pull request for this issue:
https://github.com/apache/spark/pull/22595

> Add an on-off switch to display the executor additional columns
> ---
>
> Key: SPARK-25577
> URL: https://issues.apache.org/jira/browse/SPARK-25577
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: Lantao Jin
>Priority: Major
> Attachments: Screen Shot 2018-09-30 at 5.45.56 PM.png, Screen Shot 
> 2018-09-30 at 5.46.06 PM.png
>
>
> [SPARK-17019|https://issues.apache.org/jira/browse/SPARK-17019] Expose 
> off-heap memory usage in WebUI. But it make this additional columns hidden by 
> default. If you want to see them, we need change the css code to rebuild a 
> spark-core.jar. It's very inconvenient.
> {code}
> .on_heap_memory, .off_heap_memory {
>   display: none;
> }
> {code}
> So I add an on-off switch to show those additional columns. And in future, we 
> don't afraid to add more columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25577) Add an on-off switch to display the executor additional columns

2018-09-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25577:


Assignee: (was: Apache Spark)

> Add an on-off switch to display the executor additional columns
> ---
>
> Key: SPARK-25577
> URL: https://issues.apache.org/jira/browse/SPARK-25577
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: Lantao Jin
>Priority: Major
> Attachments: Screen Shot 2018-09-30 at 5.45.56 PM.png, Screen Shot 
> 2018-09-30 at 5.46.06 PM.png
>
>
> [SPARK-17019|https://issues.apache.org/jira/browse/SPARK-17019] Expose 
> off-heap memory usage in WebUI. But it make this additional columns hidden by 
> default. If you want to see them, we need change the css code to rebuild a 
> spark-core.jar. It's very inconvenient.
> {code}
> .on_heap_memory, .off_heap_memory {
>   display: none;
> }
> {code}
> So I add an on-off switch to show those additional columns. And in future, we 
> don't afraid to add more columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25577) Add an on-off switch to display the executor additional columns

2018-09-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25577:


Assignee: Apache Spark

> Add an on-off switch to display the executor additional columns
> ---
>
> Key: SPARK-25577
> URL: https://issues.apache.org/jira/browse/SPARK-25577
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: Lantao Jin
>Assignee: Apache Spark
>Priority: Major
> Attachments: Screen Shot 2018-09-30 at 5.45.56 PM.png, Screen Shot 
> 2018-09-30 at 5.46.06 PM.png
>
>
> [SPARK-17019|https://issues.apache.org/jira/browse/SPARK-17019] Expose 
> off-heap memory usage in WebUI. But it make this additional columns hidden by 
> default. If you want to see them, we need change the css code to rebuild a 
> spark-core.jar. It's very inconvenient.
> {code}
> .on_heap_memory, .off_heap_memory {
>   display: none;
> }
> {code}
> So I add an on-off switch to show those additional columns. And in future, we 
> don't afraid to add more columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25577) Add an on-off switch to display the executor additional columns

2018-09-30 Thread Lantao Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin updated SPARK-25577:
---
Attachment: Screen Shot 2018-09-30 at 5.45.56 PM.png

> Add an on-off switch to display the executor additional columns
> ---
>
> Key: SPARK-25577
> URL: https://issues.apache.org/jira/browse/SPARK-25577
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: Lantao Jin
>Priority: Major
> Attachments: Screen Shot 2018-09-30 at 5.45.56 PM.png, Screen Shot 
> 2018-09-30 at 5.46.06 PM.png
>
>
> [SPARK-17019|https://issues.apache.org/jira/browse/SPARK-17019] Expose 
> off-heap memory usage in WebUI. But it make this additional columns hidden by 
> default. If you want to see them, we need change the css code to rebuild a 
> spark-core.jar. It's very inconvenient.
> {code}
> .on_heap_memory, .off_heap_memory {
>   display: none;
> }
> {code}
> So I add an on-off switch to show those additional columns. And in future, we 
> don't afraid to add more columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25577) Add an on-off switch to display the executor additional columns

2018-09-30 Thread Lantao Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin updated SPARK-25577:
---
Attachment: Screen Shot 2018-09-30 at 5.46.06 PM.png

> Add an on-off switch to display the executor additional columns
> ---
>
> Key: SPARK-25577
> URL: https://issues.apache.org/jira/browse/SPARK-25577
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: Lantao Jin
>Priority: Major
> Attachments: Screen Shot 2018-09-30 at 5.45.56 PM.png, Screen Shot 
> 2018-09-30 at 5.46.06 PM.png
>
>
> [SPARK-17019|https://issues.apache.org/jira/browse/SPARK-17019] Expose 
> off-heap memory usage in WebUI. But it make this additional columns hidden by 
> default. If you want to see them, we need change the css code to rebuild a 
> spark-core.jar. It's very inconvenient.
> {code}
> .on_heap_memory, .off_heap_memory {
>   display: none;
> }
> {code}
> So I add an on-off switch to show those additional columns. And in future, we 
> don't afraid to add more columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25577) Add an on-off switch to display the executor additional columns

2018-09-30 Thread Lantao Jin (JIRA)
Lantao Jin created SPARK-25577:
--

 Summary: Add an on-off switch to display the executor additional 
columns
 Key: SPARK-25577
 URL: https://issues.apache.org/jira/browse/SPARK-25577
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 2.3.2
Reporter: Lantao Jin


[SPARK-17019|https://issues.apache.org/jira/browse/SPARK-17019] Expose off-heap 
memory usage in WebUI. But it make this additional columns hidden by default. 
If you want to see them, we need change the css code to rebuild a 
spark-core.jar. It's very inconvenient.
{code}
.on_heap_memory, .off_heap_memory {
  display: none;
}
{code}

So I add an on-off switch to show those additional columns. And in future, we 
don't afraid to add more columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25523) Multi thread execute sparkSession.read().jdbc(url, table, properties) problem

2018-09-30 Thread huanghuai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huanghuai resolved SPARK-25523.
---
   Resolution: Cannot Reproduce
Fix Version/s: 2.3.0

The program can not exactly reproduce every time. 

> Multi thread execute sparkSession.read().jdbc(url, table, properties) problem
> -
>
> Key: SPARK-25523
> URL: https://issues.apache.org/jira/browse/SPARK-25523
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: h3. [IntelliJ 
> _IDEA_|http://www.baidu.com/link?url=7ZLtsOfyqR1YxLqcTU0Q-hqXWV_PsY6IzIzZoKhiXZZ4AcLrpQ4DoTG30yIN-Gs8]
>  
> local mode
>  
>Reporter: huanghuai
>Priority: Major
> Fix For: 2.3.0
>
>
> public static void test2() throws Exception{
>  String ckUrlPrefix="jdbc:clickhouse://";
>  String quote = "`";
>  JdbcDialects.registerDialect(new JdbcDialect() {
>  @Override
>  public boolean canHandle(String url)
> { return url.startsWith(ckUrlPrefix); }
> @Override
>  public String quoteIdentifier(String colName)
> { return quote + colName + quote; }
> });
> SparkSession spark = initSpark();
>  String ckUrl = "jdbc:clickhouse://192.168.2.148:8123/default";
>  Properties ckProp = new Properties();
>  ckProp.put("user", "default");
>  ckProp.put("password", "");
> String prestoUrl = "jdbc:presto://192.168.2.148:9002/mysql-xxx/xxx";
>  Properties prestoUrlProp = new Properties();
>  prestoUrlProp.put("user", "root");
>  prestoUrlProp.put("password", "");
> // new Thread(()->{
> // spark.read()
> // .jdbc(ckUrl, "ontime", ckProp).show();
> // }).start();
> System.out.println("--");
> new Thread(()->{
> spark.read()
> .jdbc(prestoUrl, "tx_user", prestoUrlProp).show();
> }).start();
> System.out.println("--");
> new Thread(()->{
> Dataset load = spark.read()
> .format("com.vertica.spark.datasource.DefaultSource")
> .option("host", "192.168.1.102")
> .option("port", 5433)
> .option("user", "dbadmin")
> .option("password", "manager")
> .option("db", "test")
> .option("dbschema", "public")
> .option("table", "customers")
> .load();
> load.printSchema();
> load.show();
> }).start();
>  System.out.println("--");
>  }
> public static SparkSession initSpark() throws Exception
> { return SparkSession.builder() .master("spark://dsjkfb1:7077") 
> //spark://dsjkfb1:7077 .appName("Test") .config("spark.executor.instances",3) 
> .config("spark.executor.cores",2) .config("spark.cores.max",6) 
> //.config("spark.default.parallelism",1) 
> .config("spark.submit.deployMode","client") 
> .config("spark.driver.memory","2G") .config("spark.executor.memory","3G") 
> .config("spark.driver.maxResultSize", "2G") .config("spark.local.dir", 
> "d:\\tmp") .config("spark.driver.host", "192.168.2.148") 
> .config("spark.scheduler.mode", "FAIR") .config("spark.jars", 
> "F:\\project\\xxx\\vertica-jdbc-7.0.1-0.jar," + 
> "F:\\project\\xxx\\clickhouse-jdbc-0.1.40.jar," + 
> "F:\\project\\xxx\\vertica-spark-connector-9.1-2.1.jar," + 
> "F:\\project\\xxx\\presto-jdbc-0.189-mining.jar")  .getOrCreate(); }
>  
>  
> {color:#ff}*  The above is code 
> --*{color}
> {color:#ff}*question: If i open vertica jdbc , thread will pending 
> forever.*{color}
> {color:#ff}*And driver loging like this:*{color}
>  
> 2018-09-26 10:32:51 INFO SharedState:54 - Setting 
> hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir 
> ('file:/C:/Users/admin/Desktop/test-project/sparktest/spark-warehouse/').
>  2018-09-26 10:32:51 INFO SharedState:54 - Warehouse path is 
> 'file:/C:/Users/admin/Desktop/test-project/sparktest/spark-warehouse/'.
>  2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@2f70d6e2\{/SQL,null,AVAILABLE,@Spark}
>  2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@1d66833d\{/SQL/json,null,AVAILABLE,@Spark}
>  2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@65af6f3a\{/SQL/execution,null,AVAILABLE,@Spark}
>  2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@55012968\{/SQL/execution/json,null,AVAILABLE,@Spark}
>  2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@59e3f5aa\{/static/sql,null,AVAILABLE,@Spark}
>  2018-09-26 10:32:52 INFO StateStoreCoordinatorRef:54 - Registered 
> StateStoreCoordinator endpoint
>  2018-09-26 10:32:52 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - 
> Registered executor NettyRpcEndpointRef(spark-client://Executor) 
> (192.168.4.232:49434) with ID 0
>  

[jira] [Commented] (SPARK-21569) Internal Spark class needs to be kryo-registered

2018-09-30 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633265#comment-16633265
 ] 

Shaofeng SHI commented on SPARK-21569:
--

Can I ask to prioritize this issue? It has blocked our plan of upgrading to 
Spark 2.3

> Internal Spark class needs to be kryo-registered
> 
>
> Key: SPARK-21569
> URL: https://issues.apache.org/jira/browse/SPARK-21569
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Ryan Williams
>Priority: Major
>
> [Full repro here|https://github.com/ryan-williams/spark-bugs/tree/hf]
> As of 2.2.0, {{saveAsNewAPIHadoopFile}} jobs fail (when 
> {{spark.kryo.registrationRequired=true}}) with:
> {code}
> java.lang.IllegalArgumentException: Class is not registered: 
> org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage
> Note: To register this class use: 
> kryo.register(org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage.class);
>   at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:458)
>   at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
>   at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:488)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:593)
>   at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:315)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:383)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This internal Spark class should be kryo-registered by Spark by default.
> This was not a problem in 2.1.1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25565) Add scala style checker to check add Locale.ROOT to .toLowerCase and .toUpperCase for internal calls

2018-09-30 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-25565.
--
   Resolution: Fixed
Fix Version/s: 2.5.0

Issue resolved by pull request 22581
[https://github.com/apache/spark/pull/22581]

> Add scala style checker to check add Locale.ROOT to .toLowerCase and 
> .toUpperCase for internal calls
> 
>
> Key: SPARK-25565
> URL: https://issues.apache.org/jira/browse/SPARK-25565
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.5.0
>Reporter: Yuming Wang
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25565) Add scala style checker to check add Locale.ROOT to .toLowerCase and .toUpperCase for internal calls

2018-09-30 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-25565:


Assignee: Hyukjin Kwon

> Add scala style checker to check add Locale.ROOT to .toLowerCase and 
> .toUpperCase for internal calls
> 
>
> Key: SPARK-25565
> URL: https://issues.apache.org/jira/browse/SPARK-25565
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.5.0
>Reporter: Yuming Wang
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org