date:20190323

[jira] [Resolved] (SPARK-27251) @volatile var cannot be defined in case class in Scala 2.11

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27251.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/24178

> @volatile var cannot be defined in case class in Scala 2.11
> ---
>
> Key: SPARK-27251
> URL: https://issues.apache.org/jira/browse/SPARK-27251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: John Zhuge
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 3.0.0
>
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/507/consoleFull
> {noformat}
> [info] Compiling 371 Scala sources and 102 Java sources to 
> /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/sql/core/target/scala-2.11/classes...
> [error] 
> /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala:162:
>  values cannot be volatile
> [error] @volatile var statsOfPlanToCache: Statistics)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27251) @volatile var cannot be defined in case class in Scala 2.11

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27251:
--
Component/s: Build

> @volatile var cannot be defined in case class in Scala 2.11
> ---
>
> Key: SPARK-27251
> URL: https://issues.apache.org/jira/browse/SPARK-27251
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 3.0.0
>Reporter: John Zhuge
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 3.0.0
>
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/507/consoleFull
> {noformat}
> [info] Compiling 371 Scala sources and 102 Java sources to 
> /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/sql/core/target/scala-2.11/classes...
> [error] 
> /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala:162:
>  values cannot be volatile
> [error] @volatile var statsOfPlanToCache: Statistics)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-27251) @volatile var cannot be defined in case class in Scala 2.11

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-27251:
-

Assignee: Takeshi Yamamuro

> @volatile var cannot be defined in case class in Scala 2.11
> ---
>
> Key: SPARK-27251
> URL: https://issues.apache.org/jira/browse/SPARK-27251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: John Zhuge
>Assignee: Takeshi Yamamuro
>Priority: Minor
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/507/consoleFull
> {noformat}
> [info] Compiling 371 Scala sources and 102 Java sources to 
> /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/sql/core/target/scala-2.11/classes...
> [error] 
> /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala:162:
>  values cannot be volatile
> [error] @volatile var statsOfPlanToCache: Statistics)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27262) Add explicit UTF-8 Encoding to DESCRIPTION

2019-03-23 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799906#comment-16799906
 ] 

Dongjoon Hyun commented on SPARK-27262:
---

Thank you, [~michaelchirico]. I added you to an Apache Spark contributor group 
and assigned this issue to you.

> Add explicit UTF-8 Encoding to DESCRIPTION
> --
>
> Key: SPARK-27262
> URL: https://issues.apache.org/jira/browse/SPARK-27262
> Project: Spark
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Michael Chirico
>Priority: Trivial
> Fix For: 3.0.0
>
>
> This will remove the following warning
> {code}
> Warning message:
> roxygen2 requires Encoding: UTF-8 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-27262) Add explicit UTF-8 Encoding to DESCRIPTION

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-27262:
-

Assignee: Michael Chirico

> Add explicit UTF-8 Encoding to DESCRIPTION
> --
>
> Key: SPARK-27262
> URL: https://issues.apache.org/jira/browse/SPARK-27262
> Project: Spark
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Michael Chirico
>Priority: Trivial
> Fix For: 3.0.0
>
>
> This will remove the following warning
> {code}
> Warning message:
> roxygen2 requires Encoding: UTF-8 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27262) Add explicit UTF-8 Encoding to DESCRIPTION

2019-03-23 Thread Michael Chirico (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799891#comment-16799891
 ] 

Michael Chirico commented on SPARK-27262:
-

This was my original PR (tagging w my Jira ID)

> Add explicit UTF-8 Encoding to DESCRIPTION
> --
>
> Key: SPARK-27262
> URL: https://issues.apache.org/jira/browse/SPARK-27262
> Project: Spark
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Trivial
> Fix For: 3.0.0
>
>
> This will remove the following warning
> {code}
> Warning message:
> roxygen2 requires Encoding: UTF-8 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27154) Incomplete Execution for Spark/dev/run-test-jenkins.py

2019-03-23 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-27154.
---
  Resolution: Not A Problem
Target Version/s:   (was: 2.4.0)

Don't set Target or Shepherd.

I don't know, but this isn't a script for any end users to execute. It's for 
the CI env.

> Incomplete Execution for Spark/dev/run-test-jenkins.py
> --
>
> Key: SPARK-27154
> URL: https://issues.apache.org/jira/browse/SPARK-27154
> Project: Spark
>  Issue Type: Question
>  Components: jenkins
>Affects Versions: 2.4.0
> Environment: {code:java}
> // code placeholder
> {code}
> JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64" BUILD_DISPLAY_NAME="Jenkins 
> build" BUILD_URL="xxx " 
> GITHUB_PROJECT_URL="https://github.com/apache/spark; 
> GITHUB_OAUTH_KEY="xxx" 
> GITHUB_API_ENDPOINT="https://api.github.com/repos/apache/spark; 
> AMPLAB_JENKINS_BUILD_TOOL="maven" AMPLAB_JENKINS="True" 
> sha1="origin/pr/23560/merge" 
> ghprbActualCommit="d73cfb51941f99516b7878acace26db35ea72076" 
> ghprbActualCommitAuthor="jiafu.zh...@intel.com" 
> ghprbActualCommitAuthorEmail="jiafu.zh...@intel.com" 
> ghprbTriggerAuthor="Marcelo Vanzin" ghprbPullId=23560 
> ghprbTargetBranch="master" ghprbSourceBranch="thread_conf_separation" 
> GIT_BRANCH="thread_conf_separation" 
> ghprbPullAuthorEmail="jiafu.zh...@intel.com" ghprbPullDescription="GitHub 
> pull request #23560 of commit d73cfb51941f99516b7878acace26db35ea72076 
> automatically merged." ghprbPullTitle="[SPARK-26632][Core] Separate Thread 
> Configurations of Driver and Executor" 
> ghprbPullLink=https://api.github.com/repos/apache/spark/pulls/23560
>Reporter: Vaibhavd
>Priority: Major
>  Labels: CI, build
>
> When I run `Spark/dev/run-test-jenkins.py` with following env variables set. 
> (Ref: 
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103464/parameters/])
> The execution gets stuck at this point (build step),
>  
> {code:java}
> [INFO] --- scala-maven-plugin:3.4.4:compile (scala-compile-first) @ 
> spark-tags_2.12 ---
> [INFO] Using zinc server for incremental compilation
> [INFO] Toolchain in scala-maven-plugin: /usr/lib/jvm/java-8-openjdk-amd64
> {code}
> I am not sure what's going wrong. Am I missing some environment variable?
> When I run `/dev/run-tests.py` there is no problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12611) test_infer_schema_to_local depended on old handling of missing value in row

2019-03-23 Thread Simon poortman (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-12611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon poortman updated SPARK-12611:
---
Attachment: Network Management Downloads.zip

> test_infer_schema_to_local depended on old handling of missing value in row
> ---
>
> Key: SPARK-12611
> URL: https://issues.apache.org/jira/browse/SPARK-12611
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
> Fix For: 1.6.1, 2.0.0
>
> Attachments: Network Management Downloads.zip
>
>
> test_infer_schema_to_local depended on the old handling of missing values in 
> row objects.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27085) Migrate CSV to File Data Source V2

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27085:
--
Issue Type: Sub-task  (was: Task)
Parent: SPARK-23507

> Migrate CSV to File Data Source V2
> --
>
> Key: SPARK-27085
> URL: https://issues.apache.org/jira/browse/SPARK-27085
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27085) Migrate CSV to File Data Source V2

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27085.
---
  Resolution: Fixed
Assignee: Gengliang Wang
   Fix Version/s: 3.0.0
Target Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/24005

> Migrate CSV to File Data Source V2
> --
>
> Key: SPARK-27085
> URL: https://issues.apache.org/jira/browse/SPARK-27085
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7150) SQLContext.range()

2019-03-23 Thread Simon poortman (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon poortman updated SPARK-7150:
--
Attachment: Network Management Downloads.zip

> SQLContext.range()
> --
>
> Key: SPARK-7150
> URL: https://issues.apache.org/jira/browse/SPARK-7150
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, SQL
>Reporter: Joseph K. Bradley
>Assignee: Adrian Wang
>Priority: Minor
>  Labels: starter
> Fix For: 1.4.0
>
> Attachments: Network Management Downloads.zip
>
>
> It would be handy to have easy ways to construct random columns for 
> DataFrames.  Proposed API:
> {code}
> class SQLContext {
>   // Return a DataFrame with a single column named "id" that has consecutive 
> value from 0 to n.
>   def range(n: Long): DataFrame
>   def range(n: Long, numPartitions: Int): DataFrame
> }
> {code}
> Usage:
> {code}
> // uniform distribution
> ctx.range(1000).select(rand())
> // normal distribution
> ctx.range(1000).select(randn())
> {code}
> We should add an RangeIterator that supports long start/stop position, and 
> then use it to create an RDD as the basis for this DataFrame.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27262) Add explicit UTF-8 Encoding to DESCRIPTION

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27262.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/23823

> Add explicit UTF-8 Encoding to DESCRIPTION
> --
>
> Key: SPARK-27262
> URL: https://issues.apache.org/jira/browse/SPARK-27262
> Project: Spark
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Trivial
> Fix For: 3.0.0
>
>
> This will remove the following warning
> {code}
> Warning message:
> roxygen2 requires Encoding: UTF-8 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27262) Add explicit UTF-8 Encoding to DESCRIPTION

2019-03-23 Thread Dongjoon Hyun (JIRA)

Dongjoon Hyun created SPARK-27262:
-

 Summary: Add explicit UTF-8 Encoding to DESCRIPTION
 Key: SPARK-27262
 URL: https://issues.apache.org/jira/browse/SPARK-27262
 Project: Spark
  Issue Type: Improvement
  Components: R
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun


This will remove the following warning
{code}
Warning message:
roxygen2 requires Encoding: UTF-8 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27261) Spark submit passing multiple configurations not documented clearly

2019-03-23 Thread Sujith Chacko (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith Chacko updated SPARK-27261:
--
Priority: Trivial  (was: Major)

> Spark submit passing multiple configurations not documented clearly
> ---
>
> Key: SPARK-27261
> URL: https://issues.apache.org/jira/browse/SPARK-27261
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 2.4.0
>Reporter: Sujith Chacko
>Priority: Trivial
>
> Spark submit passing multiple configurations not documented clearly no 
> examples given.it will be better if it can be documented since multiple 
> customers are facing the problem as clarity is less from spark documentation 
> side.
>  
> Even when i was browsing i could see few queries raised by customers.
> https://community.hortonworks.com/questions/105022/spark-submit-multiple-configurations.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27261) Spark submit passing multiple configurations not documented clearly

2019-03-23 Thread Sujith Chacko (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799763#comment-16799763
 ] 

Sujith Chacko commented on SPARK-27261:
---

cc [~cloud_fan] [~dongjoon]

> Spark submit passing multiple configurations not documented clearly
> ---
>
> Key: SPARK-27261
> URL: https://issues.apache.org/jira/browse/SPARK-27261
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 2.4.0
>Reporter: Sujith Chacko
>Priority: Major
>
> Spark submit passing multiple configurations not documented clearly no 
> examples given.it will be better if it can be documented since multiple 
> customers are facing the problem as clarity is less from spark documentation 
> side.
>  
> Even when i was browsing i could see few queries raised by customers.
> https://community.hortonworks.com/questions/105022/spark-submit-multiple-configurations.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27261) Spark submit passing multiple configurations not documented clearly

2019-03-23 Thread Sujith Chacko (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799762#comment-16799762
 ] 

Sujith Chacko commented on SPARK-27261:
---

Will raise a PR to document the above scenario regarding multiple configuration 
usage during spark-submit/shell

> Spark submit passing multiple configurations not documented clearly
> ---
>
> Key: SPARK-27261
> URL: https://issues.apache.org/jira/browse/SPARK-27261
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 2.4.0
>Reporter: Sujith Chacko
>Priority: Major
>
> Spark submit passing multiple configurations not documented clearly no 
> examples given.it will be better if it can be documented since multiple 
> customers are facing the problem as clarity is less from spark documentation 
> side.
>  
> Even when i was browsing i could see few queries raised by customers.
> https://community.hortonworks.com/questions/105022/spark-submit-multiple-configurations.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27261) Spark submit passing multiple configurations not documented clearly

2019-03-23 Thread Sujith Chacko (JIRA)

Sujith Chacko created SPARK-27261:
-

 Summary: Spark submit passing multiple configurations not 
documented clearly
 Key: SPARK-27261
 URL: https://issues.apache.org/jira/browse/SPARK-27261
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 2.4.0
Reporter: Sujith Chacko


Spark submit passing multiple configurations not documented clearly no examples 
given.it will be better if it can be documented since multiple customers are 
facing the problem as clarity is less from spark documentation side.

 

Even when i was browsing i could see few queries raised by customers.

https://community.hortonworks.com/questions/105022/spark-submit-multiple-configurations.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27260) Upgrade to Kafka 2.2.0

2019-03-23 Thread Dongjoon Hyun (JIRA)

Dongjoon Hyun created SPARK-27260:
-

 Summary: Upgrade to Kafka 2.2.0
 Key: SPARK-27260
 URL: https://issues.apache.org/jira/browse/SPARK-27260
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun


This issue updates Kafka dependency to 2.2.0 to bring the following improvement 
and bug fixes.

- https://issues.apache.org/jira/projects/KAFKA/versions/12344063




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27160) Incorrect Literal Casting of DecimalType in OrcFilters

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27160:
--
Fix Version/s: 2.3.4

> Incorrect Literal Casting of DecimalType in OrcFilters
> --
>
> Key: SPARK-27160
> URL: https://issues.apache.org/jira/browse/SPARK-27160
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Assignee: Darcy Shen
>Priority: Major
>  Labels: correctness
> Fix For: 2.3.4, 2.4.1, 3.0.0
>
>
> DecimalType Literal should not be casted to Long.
> eg. For `df.filter("x < 3.14")`, assuming df (x in DecimalType) reads from a 
> ORC table and uses the native ORC reader with predicate push down enabled, we 
> will push down the `x < 3.14` predicate to the ORC reader via a 
> SearchArgument.
> OrcFilters will construct the SearchArgument, but not handle the DecimalType 
> correctly.
> The previous impl will construct `x < 3` from `x < 3.14`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27259) CLONE - Processing Compressed HDFS files with spark failing with error: "java.lang.IllegalArgumentException: requirement failed: length (-1) cannot be negative" from s

2019-03-23 Thread Simon poortman (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799739#comment-16799739
 ] 

Simon poortman commented on SPARK-27259:


Fix this please for my

> CLONE - Processing Compressed HDFS files with spark failing with error: 
> "java.lang.IllegalArgumentException: requirement failed: length (-1) cannot 
> be negative" from spark 2.2.X
> -
>
> Key: SPARK-27259
> URL: https://issues.apache.org/jira/browse/SPARK-27259
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0
>Reporter: Simon poortman
>Priority: Blocker
>
>  
> From spark 2.2.x versions, when spark job processing any compressed HDFS 
> files with custom input file format then spark jobs are failing with error 
> "java.lang.IllegalArgumentException: requirement failed: length (-1) cannot 
> be negative", the custom input file format will return the number of bytes 
> length value as -1 for compressed file formats due to the compressed HDFS 
> file are non splitable, so for compressed input file format the split will be 
> offset as 0 and number of bytes length as -1, spark should consider the bytes 
> length value -1 as valid split for the compressed file formats.
>  
> We observed that earlier versions of spark doesn’t have this validation, and 
> found that from spark 2.2.x new validation got introduced in the class 
> InputFileBlockHolder, so spark should accept the number of bytes length value 
> -1 as valid length for input splits from spark 2.2.x as well.
>  
> +Below is the stack trace.+
>  Caused by: java.lang.IllegalArgumentException: requirement failed: length 
> (-1) cannot be negative
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.rdd.InputFileBlockHolder$.set(InputFileBlockHolder.scala:70)
>   at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:226)
>   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:214)
>   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:109)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>  
> +Below is the code snippet which caused this issue.+
>    **    {color:#ff}require(length >= 0, s"length ($length) cannot be 
> negative"){color} // This validation caused the issue. 
>  
> {code:java}
> // code placeholder
>  org.apache.spark.rdd.InputFileBlockHolder - spark-core
>  
> def set(filePath: String, startOffset: Long, length: Long): Unit = {
>     require(filePath != null, "filePath cannot be null")
>     require(startOffset >= 0, s"startOffset ($startOffset) cannot be 
> negative")
>     require(length >= 0, s"length ($length) cannot be negative")  
>     inputBlock.set(new FileBlock(UTF8String.fromString(filePath), 
> startOffset, length))
>   }
> {code}
>  
> +Steps to reproduce the issue.+
>  Please refer the below code to reproduce the issue.  
> {code:java}
> // code placeholder
> import org.apache.hadoop.mapred.JobConf
> val hadoopConf = new JobConf()
> import org.apache.hadoop.mapred.FileInputFormat
> import org.apache.hadoop.fs.Path
> FileInputFormat.setInputPaths(hadoopConf, new 
> Path("/output656/part-r-0.gz"))    
> val records = 
> sc.hadoopRDD(hadoopConf,classOf[com.platform.custom.storagehandler.INFAInputFormat],
>  classOf[org.apache.hadoop.io.LongWritable], 
> classOf[org.apache.hadoop.io.Writable]) 
> records.count()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24669) Managed table was not cleared of path after drop database cascade

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24669:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> Managed table was not cleared of path after drop database cascade
> -
>
> Key: SPARK-24669
> URL: https://issues.apache.org/jira/browse/SPARK-24669
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Dong Jiang
>Assignee: Udbhav Agrawal
>Priority: Major
> Fix For: 2.3.4, 2.4.1, 3.0.0
>
>
> I can do the following in sequence
> # Create a managed table using path options
> # Drop the table via dropping the parent database cascade
> # Re-create the database and table with a different path
> # The new table shows data from the old path, not the new path
> {code}
> echo "first" > /tmp/first.csv
> echo "second" > /tmp/second.csv
> spark-shell
> spark.version
> res0: String = 2.3.0
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/first.csv')")
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> spark.sql("drop database foo cascade")
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> "note, the path is different now, pointing to second.csv, but still showing 
> data from first file"
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> "now, if I drop the table explicitly, instead of via dropping database 
> cascade, then it will be the correct result"
> spark.sql("drop table foo.first")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> spark.table("foo.first").show()
> +--+
> |id|
> +--+
> |second|
> +--+
> {code}
> Same sequence failed in 2.3.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26604) Register channel for stream request

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26604:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> Register channel for stream request
> ---
>
> Key: SPARK-26604
> URL: https://issues.apache.org/jira/browse/SPARK-26604
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Minor
> Fix For: 2.3.4, 2.4.1, 3.0.0
>
>
> Now in {{TransportRequestHandler.processStreamRequest}}, when a stream 
> request is processed, the stream id is not registered with the current 
> channel in stream manager. It should do that so in case of that the channel 
> gets terminated we can remove associated streams from stream requests too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25863) java.lang.UnsupportedOperationException: empty.max at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala:1

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25863:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> java.lang.UnsupportedOperationException: empty.max at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala:1475)
> -
>
> Key: SPARK-25863
> URL: https://issues.apache.org/jira/browse/SPARK-25863
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, Spark Core
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Ruslan Dautkhanov
>Assignee: Takeshi Yamamuro
>Priority: Major
>  Labels: cache, catalyst, code-generation
> Fix For: 2.3.4, 2.4.1, 3.0.0
>
>
> Failing task : 
> {noformat}
> An error occurred while calling o2875.collectToPython.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 58 
> in stage 21413.0 failed 4 times, most recent failure: Lost task 58.3 in stage 
> 21413.0 (TID 4057314, pc1udatahad117, executor 431): 
> java.lang.UnsupportedOperationException: empty.max
> at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229)
> at scala.collection.AbstractTraversable.max(Traversable.scala:104)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala:1475)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1418)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1493)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1490)
> at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
> at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
> at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
> at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
> at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
> at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
> at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1365)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:81)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:40)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1321)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318)
> at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401)
> at 
> org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$filteredCachedBatches$1.apply(InMemoryTableScanExec.scala:263)
> at 
> org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$filteredCachedBatches$1.apply(InMemoryTableScanExec.scala:262)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:818)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:818)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> at org.apache.spark.scheduler.Task.run(Task.scala:109)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> at 
>

[jira] [Updated] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26742:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Assignee: Jiaxin Shan
>Priority: Major
>  Labels: easyfix
> Fix For: 2.4.1, 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27019) Spark UI's SQL tab shows inconsistent values

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27019:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> Spark UI's SQL tab shows inconsistent values
> 
>
> Key: SPARK-27019
> URL: https://issues.apache.org/jira/browse/SPARK-27019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.4.0
>Reporter: peay
>Assignee: shahid
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
> Attachments: Screenshot from 2019-03-01 21-31-48.png, 
> application_1550040445209_4748, query-1-details.png, query-1-list.png, 
> query-job-1.png, screenshot-spark-ui-details.png, screenshot-spark-ui-list.png
>
>
> Since 2.4.0, I am frequently seeing broken outputs in the SQL tab of the 
> Spark UI, where submitted/duration make no sense, description has the ID 
> instead of the actual description.
> Clicking on the link to open a query, the SQL plan is missing as well.
> I have tried to increase `spark.scheduler.listenerbus.eventqueue.capacity` to 
> very large values like 30k out of paranoia that we may have too many events, 
> but to no avail. I have not identified anything particular that leads to 
> that: it doesn't occur in all my jobs, but it does occur in a lot of them 
> still.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26606) parameters passed in extraJavaOptions are not being picked up

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26606:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> parameters passed in extraJavaOptions are not being picked up 
> --
>
> Key: SPARK-26606
> URL: https://issues.apache.org/jira/browse/SPARK-26606
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.1
>Reporter: Ravindra
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: java, spark
> Fix For: 2.3.4, 2.4.1, 3.0.0
>
>
> driver.extraJavaOptions and executor.extraJavaOptions are not being picked up 
> . Even though I see the parameters are being passed fine in the spark launch 
> command I do not see these parameters are being picked up for some unknown 
> reason. My source code throws an error stating the java params are empty
>  
> This is my spark submit command: 
>     output=`spark-submit \
>  --class com.demo.myApp.App \
>  --conf 'spark.executor.extraJavaOptions=-Dapp.env=dev -Dapp.country=US 
> -Dapp.banner=ABC -Doracle.net.tns_admin=/work/artifacts/oracle/current 
> -Djava.security.egd=[file:/dev/./urandom|file:///dev/urandom]' \
>  --conf 'spark.driver.extraJavaOptions=-Dapp.env=dev -Dapp.country=US 
> -Dapp.banner=ABC -Doracle.net.tns_admin=/work/artifacts/oracle/current 
> -Djava.security.egd=[file:/dev/./urandom|file:///dev/urandom]' \
>  --executor-memory "$EXECUTOR_MEMORY" \
>  --executor-cores "$EXECUTOR_CORES" \
>  --total-executor-cores "$TOTAL_CORES" \
>  --driver-memory "$DRIVER_MEMORY" \
>  --deploy-mode cluster \
>  /home/spark/asm//current/myapp-*.jar 2>&1 &`
>  
>  
> Is there any other way I can access the java params with out using 
> extraJavaOptions. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26932) Add a warning for Hive 2.1.1 ORC reader issue

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26932:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> Add a warning for Hive 2.1.1 ORC reader issue
> -
>
> Key: SPARK-26932
> URL: https://issues.apache.org/jira/browse/SPARK-26932
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0
>Reporter: Bo Hai
>Assignee: Bo Hai
>Priority: Minor
> Fix For: 2.4.1, 3.0.0
>
>
> As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer 
> and reader. In older version of Hive, orc reader(isn't forward-compitaient) 
> implemented by its own.
> So Hive 2.2 and older can not read orc table created by spark 2.3 and newer 
> which using apache/orc instead of Hive orc.
> I think we should add these information into Spark2.4 orc configuration file 
> : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26927) Race condition may cause dynamic allocation not working

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26927:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> Race condition may cause dynamic allocation not working
> ---
>
> Key: SPARK-26927
> URL: https://issues.apache.org/jira/browse/SPARK-26927
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.4.0
>Reporter: liupengcheng
>Assignee: liupengcheng
>Priority: Major
> Fix For: 2.3.4, 2.4.1, 3.0.0
>
> Attachments: Selection_042.jpg, Selection_043.jpg, Selection_044.jpg, 
> Selection_045.jpg, Selection_046.jpg
>
>
> Recently, we catch a bug that caused our production spark thriftserver hangs:
> There is a race condition in the ExecutorAllocationManager that the 
> `SparkListenerExecutorRemoved` event is posted before the 
> `SparkListenerTaskStart` event, which will cause the incorrect result of 
> `executorIds`, then when some executor idles, the real executors will be 
> removed even executor number is equal to `minNumExecutors` due to the 
> incorrect computation of `newExecutorTotal`(may greater than the 
> `minNumExecutors`), thus may finally causing zero available executors but a 
> wrong number of executorIds was kept in memory.
> What's more, even the `SparkListenerTaskEnd` event can not make the fake 
> `executorIds` released, because later idle event for the fake executors can 
> not cause the real removal of these executors, as they are already removed 
> and they are not exist in the `executorDataMap`  of 
> `CoaseGrainedSchedulerBackend`.
> Logs:
> !Selection_042.jpg!
> !Selection_043.jpg!
> !Selection_044.jpg!
> !Selection_045.jpg!
> !Selection_046.jpg!  
> EventLogs(DisOrder of events):
> {code:java}
> {"Event":"SparkListenerExecutorRemoved","Timestamp":1549936077543,"Executor 
> ID":"131","Removed Reason":"Container 
> container_e28_1547530852233_236191_02_000180 exited from explicit termination 
> request."}
> {"Event":"SparkListenerTaskStart","Stage ID":136689,"Stage Attempt 
> ID":0,"Task Info":{"Task ID":448048,"Index":2,"Attempt":0,"Launch 
> Time":1549936032872,"Executor 
> ID":"131","Host":"mb2-hadoop-prc-st474.awsind","Locality":"RACK_LOCAL", 
> "Speculative":false,"Getting Result Time":0,"Finish 
> Time":1549936032906,"Failed":false,"Killed":false,"Accumulables":[{"ID":12923945,"Name":"internal.metrics.executorDeserializeTime","Update":10,"Value":13,"Internal":true,"Count
>  Faile d 
> Values":true},{"ID":12923946,"Name":"internal.metrics.executorDeserializeCpuTime","Update":2244016,"Value":4286494,"Internal":true,"Count
>  Failed 
> Values":true},{"ID":12923947,"Name":"internal.metrics.executorRunTime","Update":20,"Val
>  ue":39,"Internal":true,"Count Failed 
> Values":true},{"ID":12923948,"Name":"internal.metrics.executorCpuTime","Update":13412614,"Value":26759061,"Internal":true,"Count
>  Failed Values":true},{"ID":12923949,"Name":"internal.metrics.resultS 
> ize","Update":3578,"Value":7156,"Internal":true,"Count Failed 
> Values":true},{"ID":12923954,"Name":"internal.metrics.peakExecutionMemory","Update":33816576,"Value":67633152,"Internal":true,"Count
>  Failed Values":true},{"ID":12923962,"Na 
> me":"internal.metrics.shuffle.write.bytesWritten","Update":1367,"Value":2774,"Internal":true,"Count
>  Failed 
> Values":true},{"ID":12923963,"Name":"internal.metrics.shuffle.write.recordsWritten","Update":23,"Value":45,"Internal":true,"Cou
>  nt Failed 
> Values":true},{"ID":12923964,"Name":"internal.metrics.shuffle.write.writeTime","Update":3259051,"Value":6858121,"Internal":true,"Count
>  Failed Values":true},{"ID":12921550,"Name":"number of output 
> rows","Update":"158","Value" :"289","Internal":true,"Count Failed 
> Values":true,"Metadata":"sql"},{"ID":12921546,"Name":"number of output 
> rows","Update":"23","Value":"45","Internal":true,"Count Failed 
> Values":true,"Metadata":"sql"},{"ID":12921547,"Name":"peak memo ry total 
> (min, med, 
> max)","Update":"33816575","Value":"67633149","Internal":true,"Count Failed 
> Values":true,"Metadata":"sql"},{"ID":12921541,"Name":"data size total (min, 
> med, max)","Update":"551","Value":"1077","Internal":true,"Count Failed 
> Values":true,"Metadata":"sql"}]}}
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27112) Spark Scheduler encounters two independent Deadlocks when trying to kill executors either due to dynamic allocation or blacklisting

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27112:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> Spark Scheduler encounters two independent Deadlocks when trying to kill 
> executors either due to dynamic allocation or blacklisting 
> 
>
> Key: SPARK-27112
> URL: https://issues.apache.org/jira/browse/SPARK-27112
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Parth Gandhi
>Assignee: Parth Gandhi
>Priority: Major
> Fix For: 2.3.4, 2.4.1, 3.0.0
>
> Attachments: Screen Shot 2019-02-26 at 4.10.26 PM.png, Screen Shot 
> 2019-02-26 at 4.10.48 PM.png, Screen Shot 2019-02-26 at 4.11.11 PM.png, 
> Screen Shot 2019-02-26 at 4.11.26 PM.png
>
>
> Recently, a few spark users in the organization have reported that their jobs 
> were getting stuck. On further analysis, it was found out that there exist 
> two independent deadlocks and either of them occur under different 
> circumstances. The screenshots for these two deadlocks are attached here. 
> We were able to reproduce the deadlocks with the following piece of code:
>  
> {code:java}
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.spark._
> import org.apache.spark.TaskContext
> // Simple example of Word Count in Scala
> object ScalaWordCount {
> def main(args: Array[String]) {
> if (args.length < 2) {
> System.err.println("Usage: ScalaWordCount  ")
> System.exit(1)
> }
> val conf = new SparkConf().setAppName("Scala Word Count")
> val sc = new SparkContext(conf)
> // get the input file uri
> val inputFilesUri = args(0)
> // get the output file uri
> val outputFilesUri = args(1)
> while (true) {
> val textFile = sc.textFile(inputFilesUri)
> val counts = textFile.flatMap(line => line.split(" "))
> .map(word => {if (TaskContext.get.partitionId == 5 && 
> TaskContext.get.attemptNumber == 0) throw new Exception("Fail for 
> blacklisting") else (word, 1)})
> .reduceByKey(_ + _)
> counts.saveAsTextFile(outputFilesUri)
> val conf: Configuration = new Configuration()
> val path: Path = new Path(outputFilesUri)
> val hdfs: FileSystem = FileSystem.get(conf)
> hdfs.delete(path, true)
> }
> sc.stop()
> }
> }
> {code}
>  
> Additionally, to ensure that the deadlock surfaces up soon enough, I also 
> added a small delay in the Spark code here:
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala#L256]
>  
> {code:java}
> executorIdToFailureList.remove(exec)
> updateNextExpiryTime()
> Thread.sleep(2000)
> killBlacklistedExecutor(exec)
> {code}
>  
> Also make sure that the following configs are set when launching the above 
> spark job:
> *spark.blacklist.enabled=true*
> *spark.blacklist.killBlacklistedExecutors=true*
> *spark.blacklist.application.maxFailedTasksPerExecutor=1*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27078) Read Hive materialized view throw MatchError

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27078:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> Read Hive materialized view throw MatchError
> 
>
> Key: SPARK-27078
> URL: https://issues.apache.org/jira/browse/SPARK-27078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> How to reproduce:
> Hive side:
> {code:sql}
> CREATE TABLE materialized_view_tbl (key INT);
> CREATE MATERIALIZED VIEW view_1 AS SELECT * FROM materialized_view_tbl;  -- 
> Hive 3.x
> CREATE MATERIALIZED VIEW view_1 DISABLE REWRITE AS SELECT * FROM 
> materialized_view_tbl;  -- Hive 2.3.x
> {code}
> Spark side(read from Hive 2.3.x):
> {code:java}
> bin/spark-sql --conf spark.sql.hive.metastore.version=2.3.4 --conf 
> spark.sql.hive.metastore.jars=maven
> spark-sql> select * from view_1;
> 19/03/06 16:33:44 ERROR SparkSQLDriver: Failed in [select * from view_1]
> scala.MatchError: MATERIALIZED_VIEW (of class 
> org.apache.hadoop.hive.metastore.TableType)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:434)
>   at scala.Option.map(Option.scala:163)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277)
> {code}
> Spark side(read from Hive 3.1.x):
> {code:java}
> bin/spark-sql --conf spark.sql.hive.metastore.version=3.1.1 --conf 
> spark.sql.hive.metastore.jars=maven
> spark-sql> select * from view_1;
> 19/03/05 19:55:37 ERROR SparkSQLDriver: Failed in [select * from view_1]
> java.lang.NoSuchFieldError: INDEX_TABLE
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:438)
>   at scala.Option.map(Option.scala:163)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:215)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:214)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:260)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:368)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27111) A continuous query may fail with InterruptedException when kafka consumer temporally 0 partitions temporally

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27111:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> A continuous query may fail with InterruptedException when kafka consumer 
> temporally 0 partitions temporally
> 
>
> Key: SPARK-27111
> URL: https://issues.apache.org/jira/browse/SPARK-27111
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 2.3.4, 2.4.1, 3.0.0
>
>
> Before a Kafka consumer gets assigned with partitions, its offset will 
> contain 0 partitions. However, runContinuous will still run and launch a 
> Spark job having 0 partitions. In this case, there is a race that epoch may 
> interrupt the query execution thread after `lastExecution.toRdd`, and either 
> `epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` or the next 
> `runContinuous` will get interrupted unintentionally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27107:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> Spark SQL Job failing because of Kryo buffer overflow with ORC
> --
>
> Key: SPARK-27107
> URL: https://issues.apache.org/jira/browse/SPARK-27107
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Dhruve Ashar
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> The issue occurs while trying to read ORC data and setting the SearchArgument.
> {code:java}
>  Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. 
> Available: 0, required: 9
> Serialization trace:
> literalList 
> (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl)
> leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl)
>   at com.esotericsoftware.kryo.io.Output.require(Output.java:163)
>   at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614)
>   at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
>   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
>   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
>   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534)
>   at 
> org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96)
>   at 
> org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315)
>   at 
> org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121)
>   at 
> org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>   at 
>

[jira] [Updated] (SPARK-27165) Upgrade Apache ORC to 1.5.5

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27165:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> Upgrade Apache ORC to 1.5.5
> ---
>
> Key: SPARK-27165
> URL: https://issues.apache.org/jira/browse/SPARK-27165
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.1, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> This issue aims to update Apache ORC dependency to fix SPARK-27107 .
> {code:java}
> [ORC-452] Support converting MAP column from JSON to ORC
> Improvement
> [ORC-447] Change the docker scripts to keep a persistent m2 cache
> [ORC-463] Add `version` command
> [ORC-475] ORC reader should lazily get filesystem
> [ORC-476] Make SearchAgument kryo buffer size configurable{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27134) array_distinct function does not work correctly with columns containing array of array

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27134:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> array_distinct function does not work correctly with columns containing array 
> of array
> --
>
> Key: SPARK-27134
> URL: https://issues.apache.org/jira/browse/SPARK-27134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
> Environment: Spark 2.4, scala 2.11.11
>Reporter: Mike Trenaman
>Assignee: Dilip Biswal
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> The array_distinct function introduced in spark 2.4 is producing strange 
> results when used on an array column which contains a nested array. The 
> resulting output can still contain duplicate values, and furthermore, 
> previously distinct values may be removed.
> This is easily repeatable, e.g. with this code:
> val df = Seq(
>  Seq(Seq(1, 2), Seq(1, 2), Seq(1, 2), Seq(3, 4), Seq(4, 5))
>  ).toDF("Number_Combinations")
> val dfWithDistinct = df.withColumn("distinct_combinations",
>  array_distinct(col("Number_Combinations")))
>  
> The initial 'df' DataFrame contains one row, where column 
> 'Number_Combinations' contains the following values:
> [[1, 2], [1, 2], [1, 2], [3, 4], [4, 5]]
>  
> The array_distinct function run on this column produces a new column 
> containing the following values:
> [[1, 2], [1, 2], [1, 2]]
>  
> As you can see, this contains three occurrences of the same value (1, 2), and 
> furthermore, the distinct values (3, 4), (4, 5) have been removed.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27178) k8s test failing due to missing nss library in dockerfile

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27178:
--
Fix Version/s: (was: 2.4.2)
   2.4.1

> k8s test failing due to missing nss library in dockerfile
> -
>
> Key: SPARK-27178
> URL: https://issues.apache.org/jira/browse/SPARK-27178
> Project: Spark
>  Issue Type: Bug
>  Components: Build, jenkins, Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> while performing some tests on our existing minikube and k8s infrastructure, 
> i noticed that the integration tests were failing.  i dug in and discovered 
> the following message buried at the end of the stacktrace:
> {noformat}
>   Caused by: java.io.FileNotFoundException: /usr/lib/libnss3.so
>   at sun.security.pkcs11.Secmod.initialize(Secmod.java:193)
>   at sun.security.pkcs11.SunPKCS11.(SunPKCS11.java:218)
>   ... 81 more
> {noformat}
> after i added the 'nss' package to 
> resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile, 
> everything worked.
> i will also check and see if this is failing on 2.4...
> tbh, i have no idea why this literally started failing today and not earlier. 
>  the only recent change to this file that i can find is 
> https://issues.apache.org/jira/browse/SPARK-26995



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27160) Incorrect Literal Casting of DecimalType in OrcFilters

2019-03-23 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27160:
--
Fix Version/s: 2.4.1

> Incorrect Literal Casting of DecimalType in OrcFilters
> --
>
> Key: SPARK-27160
> URL: https://issues.apache.org/jira/browse/SPARK-27160
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Assignee: Darcy Shen
>Priority: Major
>  Labels: correctness
> Fix For: 2.4.1, 3.0.0
>
>
> DecimalType Literal should not be casted to Long.
> eg. For `df.filter("x < 3.14")`, assuming df (x in DecimalType) reads from a 
> ORC table and uses the native ORC reader with predicate push down enabled, we 
> will push down the `x < 3.14` predicate to the ORC reader via a 
> SearchArgument.
> OrcFilters will construct the SearchArgument, but not handle the DecimalType 
> correctly.
> The previous impl will construct `x < 3` from `x < 3.14`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27259) CLONE - Processing Compressed HDFS files with spark failing with error: "java.lang.IllegalArgumentException: requirement failed: length (-1) cannot be negative" from spa

2019-03-23 Thread Simon poortman (JIRA)

Simon poortman created SPARK-27259:
--

 Summary: CLONE - Processing Compressed HDFS files with spark 
failing with error: "java.lang.IllegalArgumentException: requirement failed: 
length (-1) cannot be negative" from spark 2.2.X
 Key: SPARK-27259
 URL: https://issues.apache.org/jira/browse/SPARK-27259
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0
Reporter: Simon poortman


 

>From spark 2.2.x versions, when spark job processing any compressed HDFS files 
>with custom input file format then spark jobs are failing with error 
>"java.lang.IllegalArgumentException: requirement failed: length (-1) cannot be 
>negative", the custom input file format will return the number of bytes length 
>value as -1 for compressed file formats due to the compressed HDFS file are 
>non splitable, so for compressed input file format the split will be offset as 
>0 and number of bytes length as -1, spark should consider the bytes length 
>value -1 as valid split for the compressed file formats.

 

We observed that earlier versions of spark doesn’t have this validation, and 
found that from spark 2.2.x new validation got introduced in the class 
InputFileBlockHolder, so spark should accept the number of bytes length value 
-1 as valid length for input splits from spark 2.2.x as well.

 

+Below is the stack trace.+

 Caused by: java.lang.IllegalArgumentException: requirement failed: length (-1) 
cannot be negative

  at scala.Predef$.require(Predef.scala:224)

  at 
org.apache.spark.rdd.InputFileBlockHolder$.set(InputFileBlockHolder.scala:70)

  at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:226)

  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:214)

  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94)

  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)

  at org.apache.spark.scheduler.Task.run(Task.scala:109)

  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)

  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

  at java.lang.Thread.run(Thread.java:748)

 

+Below is the code snippet which caused this issue.+

   **    {color:#ff}require(length >= 0, s"length ($length) cannot be 
negative"){color} // This validation caused the issue. 

 
{code:java}
// code placeholder

 org.apache.spark.rdd.InputFileBlockHolder - spark-core

 

def set(filePath: String, startOffset: Long, length: Long): Unit = {

    require(filePath != null, "filePath cannot be null")

    require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative")

    require(length >= 0, s"length ($length) cannot be negative")  

    inputBlock.set(new FileBlock(UTF8String.fromString(filePath), startOffset, 
length))

  }
{code}
 

+Steps to reproduce the issue.+

 Please refer the below code to reproduce the issue.  
{code:java}
// code placeholder

import org.apache.hadoop.mapred.JobConf

val hadoopConf = new JobConf()

import org.apache.hadoop.mapred.FileInputFormat

import org.apache.hadoop.fs.Path

FileInputFormat.setInputPaths(hadoopConf, new 
Path("/output656/part-r-0.gz"))    

val records = 
sc.hadoopRDD(hadoopConf,classOf[com.platform.custom.storagehandler.INFAInputFormat],
 classOf[org.apache.hadoop.io.LongWritable], 
classOf[org.apache.hadoop.io.Writable]) 

records.count()
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23643) XORShiftRandom.hashSeed allocates unnecessary memory

2019-03-23 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-23643.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 20793
[https://github.com/apache/spark/pull/20793]

> XORShiftRandom.hashSeed allocates unnecessary memory
> 
>
> Key: SPARK-23643
> URL: https://issues.apache.org/jira/browse/SPARK-23643
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Trivial
> Fix For: 3.0.0
>
>
> The hashSeed method allocates 64 bytes buffer and puts only 8 bytes of the 
> seed parameter into it. Other bytes are always zero and could be easily 
> excluded from hash calculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23643) XORShiftRandom.hashSeed allocates unnecessary memory

2019-03-23 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-23643:
-

Assignee: Maxim Gekk

> XORShiftRandom.hashSeed allocates unnecessary memory
> 
>
> Key: SPARK-23643
> URL: https://issues.apache.org/jira/browse/SPARK-23643
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Trivial
>
> The hashSeed method allocates 64 bytes buffer and puts only 8 bytes of the 
> seed parameter into it. Other bytes are always zero and could be easily 
> excluded from hash calculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27258) The value of "spark.app.name" or "--name" starts with number , which causes resourceName does not match regular expression

2019-03-23 Thread hehuiyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799673#comment-16799673
 ] 

hehuiyuan commented on SPARK-27258:
---

For example:  the appName = "1min-machinereg-yf" , which is the prefix of 
resourceName.

When use the regular expression "[a-z]([-a-z0-9]*[a-z0-9])?" to validate，the 
resourceName that does not satisfy the beginning of the letter is invalid.

> The value of "spark.app.name" or "--name" starts with number , which causes 
> resourceName does not match regular expression
> --
>
> Key: SPARK-27258
> URL: https://issues.apache.org/jira/browse/SPARK-27258
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: hehuiyuan
>Priority: Minor
> Fix For: 3.0.0
>
>
> {code:java}
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://xxx:xxx/api/v1/namespaces/xxx/services. Message: Service 
> "1min-machinereg-yf-1544604108931-driver-svc" is invalid: metadata.name: 
> Invalid value: "1min-machinereg-yf-1544604108931-driver-svc": a DNS-1035 
> label must consist of lower case alphanumeric characters or '-', start with 
> an alphabetic character, and end with an alphanumeric character (e.g. 
> 'my-name',  or 'abc-123', regex used for validation is 
> '[a-z]([-a-z0-9]*[a-z0-9])?'). Received status: Status(apiVersion=v1, 
> code=422, details=StatusDetails(causes=[StatusCause(field=metadata.name, 
> message=Invalid value: "1min-machinereg-yf-1544604108931-driver-svc": a 
> DNS-1035 label must consist of lower case alphanumeric characters or '-', 
> start with an alphabetic character, and end with an alphanumeric character 
> (e.g. 'my-name',  or 'abc-123', regex used for validation is 
> '[a-z]([-a-z0-9]*[a-z0-9])?'), reason=FieldValueInvalid, 
> additionalProperties={})], group=null, kind=Service, 
> name=1min-machinereg-yf-1544604108931-driver-svc, retryAfterSeconds=null, 
> uid=null, additionalProperties={}).
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27255) Aggregate functions should not be allowed in WHERE

2019-03-23 Thread Chakravarthi (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799671#comment-16799671
 ] 

Chakravarthi commented on SPARK-27255:
--

Thanks for reporting,will be working on this issue.

> Aggregate functions should not be allowed in WHERE
> --
>
> Key: SPARK-27255
> URL: https://issues.apache.org/jira/browse/SPARK-27255
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Mingcong Han
>Priority: Minor
>
> Aggregate functions should not be allowed in WHERE clause. But Spark SQL 
> throws an exception when generating codes. It is supposed to throw an 
> exception during parsing or analyzing.
> Here is an example:
> {code:scala}
> val df = spark.sql("select * from t where sum(ta) > 0")
> df.explain(true)
> df.show()
> {code}
> Spark SQL explains it as:
> {noformat}
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Filter ('sum('ta) > 0)
>+- 'UnresolvedRelation `t`
> == Analyzed Logical Plan ==
> ta: int, tb: int
> Project [ta#5, tb#6]
> +- Filter (sum(cast(ta#5 as bigint)) > cast(0 as bigint))
>+- SubqueryAlias `t`
>   +- Project [ta#5, tb#6]
>  +- SubqueryAlias `as`
> +- LocalRelation [ta#5, tb#6]
> == Optimized Logical Plan ==
> Filter (sum(cast(ta#5 as bigint)) > 0)
> +- LocalRelation [ta#5, tb#6]
> == Physical Plan ==
> *(1) Filter (sum(cast(ta#5 as bigint)) > 0)
> +- LocalTableScan [ta#5, tb#6]
> {noformat}
> But when executing `df.show()`:
> {noformat}
> Exception in thread "main" java.lang.UnsupportedOperationException: Cannot 
> generate code for expression: sum(cast(input[0, int, false] as bigint))
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode(Expression.scala:291)
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode$(Expression.scala:290)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.doGenCode(interfaces.scala:87)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:138)
>   at scala.Option.getOrElse(Option.scala:138)
> {noformat}
> I have tried it in PostgreSQL, and it directly throws an error:
> {noformat}
> ERROR: Aggregate functions are not allowed in WHERE. 
> {noformat}
> We'd better throw an AnalysisException here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27258) The value of "spark.app.name" or "--name" starts with number , which causes resourceName does not match regular expression

2019-03-23 Thread hehuiyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hehuiyuan updated SPARK-27258:
--
Description: 

{code:java}
Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST 
at: https://xxx:xxx/api/v1/namespaces/xxx/services. Message: Service 
"1min-machinereg-yf-1544604108931-driver-svc" is invalid: metadata.name: 
Invalid value: "1min-machinereg-yf-1544604108931-driver-svc": a DNS-1035 label 
must consist of lower case alphanumeric characters or '-', start with an 
alphabetic character, and end with an alphanumeric character (e.g. 'my-name',  
or 'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?'). 
Received status: Status(apiVersion=v1, code=422, 
details=StatusDetails(causes=[StatusCause(field=metadata.name, message=Invalid 
value: "1min-machinereg-yf-1544604108931-driver-svc": a DNS-1035 label must 
consist of lower case alphanumeric characters or '-', start with an alphabetic 
character, and end with an alphanumeric character (e.g. 'my-name',  or 
'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?'), 
reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Service, 
name=1min-machinereg-yf-1544604108931-driver-svc, retryAfterSeconds=null, 
uid=null, additionalProperties={}).
{code}

  was:Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST 
at: https://xxx:xxx/api/v1/namespaces/xxx/services. Message: Service 
"1min-machinereg-yf-1544604108931-driver-svc" is invalid: metadata.name: 
Invalid value: "1min-machinereg-yf-1544604108931-driver-svc": a DNS-1035 label 
must consist of lower case alphanumeric characters or '-', start with an 
alphabetic character, and end with an alphanumeric character (e.g. 'my-name',  
or 'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?'). 
Received status: Status(apiVersion=v1, code=422, 
details=StatusDetails(causes=[StatusCause(field=metadata.name, message=Invalid 
value: "1min-machinereg-yf-1544604108931-driver-svc": a DNS-1035 label must 
consist of lower case alphanumeric characters or '-', start with an alphabetic 
character, and end with an alphanumeric character (e.g. 'my-name',  or 
'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?'), 
reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Service, 
name=1min-machinereg-yf-1544604108931-driver-svc, retryAfterSeconds=null, 
uid=null, additionalProperties={}).


> The value of "spark.app.name" or "--name" starts with number , which causes 
> resourceName does not match regular expression
> --
>
> Key: SPARK-27258
> URL: https://issues.apache.org/jira/browse/SPARK-27258
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: hehuiyuan
>Priority: Minor
> Fix For: 3.0.0
>
>
> {code:java}
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://xxx:xxx/api/v1/namespaces/xxx/services. Message: Service 
> "1min-machinereg-yf-1544604108931-driver-svc" is invalid: metadata.name: 
> Invalid value: "1min-machinereg-yf-1544604108931-driver-svc": a DNS-1035 
> label must consist of lower case alphanumeric characters or '-', start with 
> an alphabetic character, and end with an alphanumeric character (e.g. 
> 'my-name',  or 'abc-123', regex used for validation is 
> '[a-z]([-a-z0-9]*[a-z0-9])?'). Received status: Status(apiVersion=v1, 
> code=422, details=StatusDetails(causes=[StatusCause(field=metadata.name, 
> message=Invalid value: "1min-machinereg-yf-1544604108931-driver-svc": a 
> DNS-1035 label must consist of lower case alphanumeric characters or '-', 
> start with an alphabetic character, and end with an alphanumeric character 
> (e.g. 'my-name',  or 'abc-123', regex used for validation is 
> '[a-z]([-a-z0-9]*[a-z0-9])?'), reason=FieldValueInvalid, 
> additionalProperties={})], group=null, kind=Service, 
> name=1min-machinereg-yf-1544604108931-driver-svc, retryAfterSeconds=null, 
> uid=null, additionalProperties={}).
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27258) The value of "spark.app.name" or "--name" starts with number , which causes resourceName does not match regular expression

2019-03-23 Thread hehuiyuan (JIRA)

hehuiyuan created SPARK-27258:
-

 Summary: The value of "spark.app.name" or "--name" starts with 
number , which causes resourceName does not match regular expression
 Key: SPARK-27258
 URL: https://issues.apache.org/jira/browse/SPARK-27258
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.3.0
Reporter: hehuiyuan
 Fix For: 3.0.0


Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST 
at: https://xxx:xxx/api/v1/namespaces/xxx/services. Message: Service 
"1min-machinereg-yf-1544604108931-driver-svc" is invalid: metadata.name: 
Invalid value: "1min-machinereg-yf-1544604108931-driver-svc": a DNS-1035 label 
must consist of lower case alphanumeric characters or '-', start with an 
alphabetic character, and end with an alphanumeric character (e.g. 'my-name',  
or 'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?'). 
Received status: Status(apiVersion=v1, code=422, 
details=StatusDetails(causes=[StatusCause(field=metadata.name, message=Invalid 
value: "1min-machinereg-yf-1544604108931-driver-svc": a DNS-1035 label must 
consist of lower case alphanumeric characters or '-', start with an alphabetic 
character, and end with an alphanumeric character (e.g. 'my-name',  or 
'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?'), 
reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Service, 
name=1min-machinereg-yf-1544604108931-driver-svc, retryAfterSeconds=null, 
uid=null, additionalProperties={}).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27257) The value of "spark.app.name" or "--name" starts with number , which causes resourceName does not match regular expression

2019-03-23 Thread hehuiyuan (JIRA)

hehuiyuan created SPARK-27257:
-

 Summary: The value of "spark.app.name" or "--name" starts with 
number , which causes resourceName does not match regular expression
 Key: SPARK-27257
 URL: https://issues.apache.org/jira/browse/SPARK-27257
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
 Environment: spark2.3.1
Reporter: hehuiyuan
 Fix For: 3.0.0



{code:java}
Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST 
at: https://xxx:xxx/api/v1/namespaces/xxx/services. Message: Service 
"1min-machinereg-yf-1544604108931-driver-svc" is invalid: metadata.name: 
Invalid value: "1min-machinereg-yf-1544604108931-driver-svc": a DNS-1035 label 
must consist of lower case alphanumeric characters or '-', start with an 
alphabetic character, and end with an alphanumeric character (e.g. 'my-name',  
or 'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?'). 
Received status: Status(apiVersion=v1, code=422, 
details=StatusDetails(causes=[StatusCause(field=metadata.name, message=Invalid 
value: "1min-machinereg-yf-1544604108931-driver-svc": a DNS-1035 label must 
consist of lower case alphanumeric characters or '-', start with an alphabetic 
character, and end with an alphanumeric character (e.g. 'my-name',  or 
'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?'), 
reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Service, 
name=1min-machinereg-yf-1544604057363-1544604108931-driver-svc, 
retryAfterSeconds=null, uid=null, additionalProperties={}).
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-27256) If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.

2019-03-23 Thread Shivu Sondur (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivu Sondur updated SPARK-27256:
-
Comment: was deleted

(was: i am working on it)

> If the configuration is used to set the number of bytes, we'd better use 
> `bytesConf`'.
> --
>
> Key: SPARK-27256
> URL: https://issues.apache.org/jira/browse/SPARK-27256
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: liuxian
>Priority: Minor
>
> Currently, if we want to configure `spark. sql. files. maxPartitionBytes` to 
> 256 megabytes, we must set  `spark. sql. files. maxPartitionBytes=268435456`, 
> which is very unfriendly to users.
> And if we set it like this:`spark. sql. files. maxPartitionBytes=256M`, we 
> will  encounter this exception:
> _Exception in thread "main" java.lang.IllegalArgumentException: 
> spark.sql.files.maxPartitionBytes should be long, but was 128M_
>     _at 
> org.apache.spark.internal.config.ConfigHelpers$.toNumber(ConfigBuilder.scala:34)_



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27256) If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.

2019-03-23 Thread Shivu Sondur (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799589#comment-16799589
 ] 

Shivu Sondur commented on SPARK-27256:
--

i am working on it

> If the configuration is used to set the number of bytes, we'd better use 
> `bytesConf`'.
> --
>
> Key: SPARK-27256
> URL: https://issues.apache.org/jira/browse/SPARK-27256
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: liuxian
>Priority: Minor
>
> Currently, if we want to configure `spark. sql. files. maxPartitionBytes` to 
> 256 megabytes, we must set  `spark. sql. files. maxPartitionBytes=268435456`, 
> which is very unfriendly to users.
> And if we set it like this:`spark. sql. files. maxPartitionBytes=256M`, we 
> will  encounter this exception:
> _Exception in thread "main" java.lang.IllegalArgumentException: 
> spark.sql.files.maxPartitionBytes should be long, but was 128M_
>     _at 
> org.apache.spark.internal.config.ConfigHelpers$.toNumber(ConfigBuilder.scala:34)_



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27253) SparkSession clone discards SQLConf overrides in favor of SparkConf defaults

2019-03-23 Thread Chakravarthi (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799574#comment-16799574
 ] 

Chakravarthi commented on SPARK-27253:
--

Thanks for reporting ,will be working on this .

> SparkSession clone discards SQLConf overrides in favor of SparkConf defaults
> 
>
> Key: SPARK-27253
> URL: https://issues.apache.org/jira/browse/SPARK-27253
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>
> SparkSession.cloneSession() is normally supposed to create a child session 
> which inherits all the SQLConf values of its parent session. But when a SQL 
> conf is given a global default through the SparkConf, this does not happen; 
> the child session will receive the SparkConf default rather than its parent's 
> SQLConf override.
>  
> This is particularly impactful in structured streaming, as the microbatches 
> run in a cloned child session.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27256) If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.

2019-03-23 Thread liuxian (JIRA)

liuxian created SPARK-27256:
---

 Summary: If the configuration is used to set the number of bytes, 
we'd better use `bytesConf`'.
 Key: SPARK-27256
 URL: https://issues.apache.org/jira/browse/SPARK-27256
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 3.0.0
Reporter: liuxian


Currently, if we want to configure `spark. sql. files. maxPartitionBytes` to 
256 megabytes, we must set  `spark. sql. files. maxPartitionBytes=268435456`, 
which is very unfriendly to users.

And if we set it like this:`spark. sql. files. maxPartitionBytes=256M`, we will 
 encounter this exception:

_Exception in thread "main" java.lang.IllegalArgumentException: 
spark.sql.files.maxPartitionBytes should be long, but was 128M_
    _at 
org.apache.spark.internal.config.ConfigHelpers$.toNumber(ConfigBuilder.scala:34)_



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

48 matches

Mail list logo