[jira] [Comment Edited] (SPARK-26128) filter breaks input_file_name
[ https://issues.apache.org/jira/browse/SPARK-26128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694330#comment-16694330 ] Hyukjin Kwon edited comment on SPARK-26128 at 11/21/18 7:26 AM: I can't reproduce this: {code} scala> spark.range(10).write.parquet("/tmp/newparquet") scala> spark.read.parquet("/tmp/newparquet").where("id > 5").select(input_file_name()).show(5,false) +--+ |input_file_name() | +--+ |file:///tmp/newparquet/part-7-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-7-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-6-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-5-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| +--+ scala> spark.read.parquet("/tmp/newparquet").select(input_file_name()).show(5,false) +--+ |input_file_name() | +--+ |file:///tmp/newparquet/part-7-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-7-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-3-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-3-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-0-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| +--+ only showing top 5 rows {code} mind showing how {{"/tmp/newparquet"}} is made? was (Author: hyukjin.kwon): I can't reproduce this: ``` scala> spark.range(10).write.parquet("/tmp/newparquet") 18/11/21 15:23:16 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory Scaling row group sizes to 96.54% for 7 writers 18/11/21 15:23:16 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory Scaling row group sizes to 84.47% for 8 writers 18/11/21 15:23:16 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory Scaling row group sizes to 96.54% for 7 writers scala> spark.read.parquet("/tmp/newparquet").where("id > 5").select(input_file_name()).show(5,false) +--+ |input_file_name() | +--+ |file:///tmp/newparquet/part-7-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-7-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-6-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-5-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| +--+ scala> spark.read.parquet("/tmp/newparquet").select(input_file_name()).show(5,false) +--+ |input_file_name() | +--+ |file:///tmp/newparquet/part-7-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-7-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-3-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-3-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-0-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| +--+ only showing top 5 rows ``` mind showing how {{"/tmp/newparquet"}} is made? > filter breaks input_file_name > - > > Key: SPARK-26128 > URL: https://issues.apache.org/jira/browse/SPARK-26128 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Paul Praet >
[jira] [Commented] (SPARK-26126) Should put scala-library deps into root pom instead of spark-tags module
[ https://issues.apache.org/jira/browse/SPARK-26126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694333#comment-16694333 ] Hyukjin Kwon commented on SPARK-26126: -- Hi [~liupengcheng], is it a question or an issue? > Should put scala-library deps into root pom instead of spark-tags module > > > Key: SPARK-26126 > URL: https://issues.apache.org/jira/browse/SPARK-26126 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.3.0, 2.4.0 >Reporter: liupengcheng >Priority: Minor > > When I do some backport in our custom spark, I notice some strange code from > spark-tags module: > {code:java} > > > org.scala-lang > scala-library > ${scala.version} > > > {code} > As i known, should spark-tags only contains some annotation related classes > or deps? > should we put the scala-library deps to root pom? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26128) filter breaks input_file_name
[ https://issues.apache.org/jira/browse/SPARK-26128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694330#comment-16694330 ] Hyukjin Kwon commented on SPARK-26128: -- I can't reproduce this: ``` scala> spark.range(10).write.parquet("/tmp/newparquet") 18/11/21 15:23:16 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory Scaling row group sizes to 96.54% for 7 writers 18/11/21 15:23:16 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory Scaling row group sizes to 84.47% for 8 writers 18/11/21 15:23:16 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory Scaling row group sizes to 96.54% for 7 writers scala> spark.read.parquet("/tmp/newparquet").where("id > 5").select(input_file_name()).show(5,false) +--+ |input_file_name() | +--+ |file:///tmp/newparquet/part-7-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-7-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-6-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-5-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| +--+ scala> spark.read.parquet("/tmp/newparquet").select(input_file_name()).show(5,false) +--+ |input_file_name() | +--+ |file:///tmp/newparquet/part-7-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-7-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-3-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-3-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| |file:///tmp/newparquet/part-0-84e98703-bfbb-4781-b3b4-de862f0270b7-c000.snappy.parquet| +--+ only showing top 5 rows ``` mind showing how {{"/tmp/newparquet"}} is made? > filter breaks input_file_name > - > > Key: SPARK-26128 > URL: https://issues.apache.org/jira/browse/SPARK-26128 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Paul Praet >Priority: Minor > > This works: > {code:java} > scala> > spark.read.parquet("/tmp/newparquet").select(input_file_name).show(5,false) > +-+ > |input_file_name() > | > +-+ > |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| > |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| > |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| > |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| > |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| > +-+ > {code} > When adding a filter: > {code:java} > scala> > spark.read.parquet("/tmp/newparquet").where("key.station='XYZ'").select(input_file_name()).show(5,false) > +-+ > |input_file_name()| > +-+ > | | > | | > | | > | | > | | > +-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For
[jira] [Commented] (SPARK-26134) Upgrading Hadoop to 2.7.4 to fix java.version problem
[ https://issues.apache.org/jira/browse/SPARK-26134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694320#comment-16694320 ] Takanobu Asanuma commented on SPARK-26134: -- Hi, [~dongjoon]. I confirmed that spark-shell failed on jdk-11+28 and passed on jdk-11.0.1+13 with the master branch. I use Oracle OpenJDK, but I could not find the old archive. Seems it still can be downloaded from AdoptOpenJDK. https://adoptopenjdk.net/archive.html?variant=openjdk11=hotspot > Upgrading Hadoop to 2.7.4 to fix java.version problem > - > > Key: SPARK-26134 > URL: https://issues.apache.org/jira/browse/SPARK-26134 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Takanobu Asanuma >Priority: Major > > When I ran spark-shell on JDK11+28(2018-09-25), It failed with the error > below. > {noformat} > Exception in thread "main" java.lang.ExceptionInInitializerError > at org.apache.hadoop.util.StringUtils.(StringUtils.java:80) > at > org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634) > at > org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2427) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2427) > at org.apache.spark.SecurityManager.(SecurityManager.scala:79) > at > org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:359) > at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:359) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$9(SparkSubmit.scala:367) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:367) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2 > at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319) > at java.base/java.lang.String.substring(String.java:1874) > at org.apache.hadoop.util.Shell.(Shell.java:52) > {noformat} > This is a Hadoop issue that fails to parse some {{java.version}}. It has been > fixed from Hadoop-2.7.4(see HADOOP-14586). > Note, Hadoop-2.7.5 or upper have another problem with Spark (SPARK-25330). So > upgrading to 2.7.4 would be fine for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26134) Upgrading Hadoop to 2.7.4 to fix java.version problem
[ https://issues.apache.org/jira/browse/SPARK-26134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694309#comment-16694309 ] Dongjoon Hyun commented on SPARK-26134: --- Hi, [~tasanuma0829] . Thank you for reporting and sending a PR. If you don't mind, may I ask your environment on JDK 11? > Upgrading Hadoop to 2.7.4 to fix java.version problem > - > > Key: SPARK-26134 > URL: https://issues.apache.org/jira/browse/SPARK-26134 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Takanobu Asanuma >Priority: Major > > When I ran spark-shell on JDK11+28(2018-09-25), It failed with the error > below. > {noformat} > Exception in thread "main" java.lang.ExceptionInInitializerError > at org.apache.hadoop.util.StringUtils.(StringUtils.java:80) > at > org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634) > at > org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2427) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2427) > at org.apache.spark.SecurityManager.(SecurityManager.scala:79) > at > org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:359) > at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:359) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$9(SparkSubmit.scala:367) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:367) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2 > at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319) > at java.base/java.lang.String.substring(String.java:1874) > at org.apache.hadoop.util.Shell.(Shell.java:52) > {noformat} > This is a Hadoop issue that fails to parse some {{java.version}}. It has been > fixed from Hadoop-2.7.4(see HADOOP-14586). > Note, Hadoop-2.7.5 or upper have another problem with Spark (SPARK-25330). So > upgrading to 2.7.4 would be fine for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26134) Upgrading Hadoop to 2.7.4 to fix java.version problem
[ https://issues.apache.org/jira/browse/SPARK-26134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated SPARK-26134: - Affects Version/s: (was: 2.4.0) 3.0.0 > Upgrading Hadoop to 2.7.4 to fix java.version problem > - > > Key: SPARK-26134 > URL: https://issues.apache.org/jira/browse/SPARK-26134 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Takanobu Asanuma >Priority: Major > > When I ran spark-shell on JDK11+28(2018-09-25), It failed with the error > below. > {noformat} > Exception in thread "main" java.lang.ExceptionInInitializerError > at org.apache.hadoop.util.StringUtils.(StringUtils.java:80) > at > org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634) > at > org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2427) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2427) > at org.apache.spark.SecurityManager.(SecurityManager.scala:79) > at > org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:359) > at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:359) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$9(SparkSubmit.scala:367) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:367) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2 > at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319) > at java.base/java.lang.String.substring(String.java:1874) > at org.apache.hadoop.util.Shell.(Shell.java:52) > {noformat} > This is a Hadoop issue that fails to parse some {{java.version}}. It has been > fixed from Hadoop-2.7.4(see HADOOP-14586). > Note, Hadoop-2.7.5 or upper have another problem with Spark (SPARK-25330). So > upgrading to 2.7.4 would be fine for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26134) Upgrading Hadoop to 2.7.4 to fix java.version problem
[ https://issues.apache.org/jira/browse/SPARK-26134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26134: Assignee: Apache Spark > Upgrading Hadoop to 2.7.4 to fix java.version problem > - > > Key: SPARK-26134 > URL: https://issues.apache.org/jira/browse/SPARK-26134 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Takanobu Asanuma >Assignee: Apache Spark >Priority: Major > > When I ran spark-shell on JDK11+28(2018-09-25), It failed with the error > below. > {noformat} > Exception in thread "main" java.lang.ExceptionInInitializerError > at org.apache.hadoop.util.StringUtils.(StringUtils.java:80) > at > org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634) > at > org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2427) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2427) > at org.apache.spark.SecurityManager.(SecurityManager.scala:79) > at > org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:359) > at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:359) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$9(SparkSubmit.scala:367) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:367) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2 > at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319) > at java.base/java.lang.String.substring(String.java:1874) > at org.apache.hadoop.util.Shell.(Shell.java:52) > {noformat} > This is a Hadoop issue that fails to parse some {{java.version}}. It has been > fixed from Hadoop-2.7.4(see HADOOP-14586). > Note, Hadoop-2.7.5 or upper have another problem with Spark (SPARK-25330). So > upgrading to 2.7.4 would be fine for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26134) Upgrading Hadoop to 2.7.4 to fix java.version problem
[ https://issues.apache.org/jira/browse/SPARK-26134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694289#comment-16694289 ] Apache Spark commented on SPARK-26134: -- User 'tasanuma' has created a pull request for this issue: https://github.com/apache/spark/pull/23101 > Upgrading Hadoop to 2.7.4 to fix java.version problem > - > > Key: SPARK-26134 > URL: https://issues.apache.org/jira/browse/SPARK-26134 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Takanobu Asanuma >Priority: Major > > When I ran spark-shell on JDK11+28(2018-09-25), It failed with the error > below. > {noformat} > Exception in thread "main" java.lang.ExceptionInInitializerError > at org.apache.hadoop.util.StringUtils.(StringUtils.java:80) > at > org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634) > at > org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2427) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2427) > at org.apache.spark.SecurityManager.(SecurityManager.scala:79) > at > org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:359) > at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:359) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$9(SparkSubmit.scala:367) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:367) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2 > at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319) > at java.base/java.lang.String.substring(String.java:1874) > at org.apache.hadoop.util.Shell.(Shell.java:52) > {noformat} > This is a Hadoop issue that fails to parse some {{java.version}}. It has been > fixed from Hadoop-2.7.4(see HADOOP-14586). > Note, Hadoop-2.7.5 or upper have another problem with Spark (SPARK-25330). So > upgrading to 2.7.4 would be fine for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26134) Upgrading Hadoop to 2.7.4 to fix java.version problem
[ https://issues.apache.org/jira/browse/SPARK-26134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694288#comment-16694288 ] Apache Spark commented on SPARK-26134: -- User 'tasanuma' has created a pull request for this issue: https://github.com/apache/spark/pull/23101 > Upgrading Hadoop to 2.7.4 to fix java.version problem > - > > Key: SPARK-26134 > URL: https://issues.apache.org/jira/browse/SPARK-26134 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Takanobu Asanuma >Priority: Major > > When I ran spark-shell on JDK11+28(2018-09-25), It failed with the error > below. > {noformat} > Exception in thread "main" java.lang.ExceptionInInitializerError > at org.apache.hadoop.util.StringUtils.(StringUtils.java:80) > at > org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634) > at > org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2427) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2427) > at org.apache.spark.SecurityManager.(SecurityManager.scala:79) > at > org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:359) > at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:359) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$9(SparkSubmit.scala:367) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:367) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2 > at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319) > at java.base/java.lang.String.substring(String.java:1874) > at org.apache.hadoop.util.Shell.(Shell.java:52) > {noformat} > This is a Hadoop issue that fails to parse some {{java.version}}. It has been > fixed from Hadoop-2.7.4(see HADOOP-14586). > Note, Hadoop-2.7.5 or upper have another problem with Spark (SPARK-25330). So > upgrading to 2.7.4 would be fine for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26134) Upgrading Hadoop to 2.7.4 to fix java.version problem
[ https://issues.apache.org/jira/browse/SPARK-26134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26134: Assignee: (was: Apache Spark) > Upgrading Hadoop to 2.7.4 to fix java.version problem > - > > Key: SPARK-26134 > URL: https://issues.apache.org/jira/browse/SPARK-26134 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Takanobu Asanuma >Priority: Major > > When I ran spark-shell on JDK11+28(2018-09-25), It failed with the error > below. > {noformat} > Exception in thread "main" java.lang.ExceptionInInitializerError > at org.apache.hadoop.util.StringUtils.(StringUtils.java:80) > at > org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634) > at > org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2427) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2427) > at org.apache.spark.SecurityManager.(SecurityManager.scala:79) > at > org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:359) > at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:359) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$9(SparkSubmit.scala:367) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:367) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2 > at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319) > at java.base/java.lang.String.substring(String.java:1874) > at org.apache.hadoop.util.Shell.(Shell.java:52) > {noformat} > This is a Hadoop issue that fails to parse some {{java.version}}. It has been > fixed from Hadoop-2.7.4(see HADOOP-14586). > Note, Hadoop-2.7.5 or upper have another problem with Spark (SPARK-25330). So > upgrading to 2.7.4 would be fine for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26134) Upgrading Hadoop to 2.7.4 to fix java.version problem
[ https://issues.apache.org/jira/browse/SPARK-26134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated SPARK-26134: - Component/s: (was: Spark Shell) Spark Core > Upgrading Hadoop to 2.7.4 to fix java.version problem > - > > Key: SPARK-26134 > URL: https://issues.apache.org/jira/browse/SPARK-26134 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Takanobu Asanuma >Priority: Major > > When I ran spark-shell on JDK11+28(2018-09-25), It failed with the error > below. > {noformat} > Exception in thread "main" java.lang.ExceptionInInitializerError > at org.apache.hadoop.util.StringUtils.(StringUtils.java:80) > at > org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634) > at > org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2427) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2427) > at org.apache.spark.SecurityManager.(SecurityManager.scala:79) > at > org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:359) > at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:359) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$9(SparkSubmit.scala:367) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:367) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2 > at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319) > at java.base/java.lang.String.substring(String.java:1874) > at org.apache.hadoop.util.Shell.(Shell.java:52) > {noformat} > This is a Hadoop issue that fails to parse some {{java.version}}. It has been > fixed from Hadoop-2.7.4(see HADOOP-14586). > Note, Hadoop-2.7.5 or upper have another problem with Spark (SPARK-25330). So > upgrading to 2.7.4 would be fine for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26135) Structured Streaming reporting metrics programmatically using asynchronous APIs can't get all queries metrics
[ https://issues.apache.org/jira/browse/SPARK-26135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bjkonglu updated SPARK-26135: - Environment: JDK: 1.8.0_151 Scala: 2.11.8 Hadoop: 2.7.1 Spark: 2.3.1 was: h3. > Structured Streaming reporting metrics programmatically using asynchronous > APIs can't get all queries metrics > - > > Key: SPARK-26135 > URL: https://issues.apache.org/jira/browse/SPARK-26135 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.3.1 > Environment: JDK: 1.8.0_151 > Scala: 2.11.8 > Hadoop: 2.7.1 > Spark: 2.3.1 > > > >Reporter: bjkonglu >Priority: Major > > h3. Background > When I use Structured Streaming handle real-time data, I also want to know > the streaming application metrics, for example > prcessedRowsPerSecond、inputRowsPerSeconds etc. So I report metrics > programmatically using asynchronous APIs. > {code:java} > val spark: SparkSession = ... > spark.streams.addListener(new StreamingQueryListener() { > override def onQueryStarted(queryStarted: QueryStartedEvent): Unit = { > println("Query started: " + queryStarted.id) > } > override def onQueryTerminated(queryTerminated: QueryTerminatedEvent): > Unit = { > println("Query terminated: " + queryTerminated.id) > } > override def onQueryProgress(queryProgress: QueryProgressEvent): Unit = { > println("Query made progress: " + queryProgress.progress) > } > }) > {code} > h3. Questions > When the streaming application has a single query, asynchronous APIs work > well. But when the streaming application has many queries, asynchronous APIs > can't report metrics exactly, some queries can report well, some queries > report delay and metrics number lower. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26135) Structured Streaming reporting metrics programmatically using asynchronous APIs can't get all queries metrics
[ https://issues.apache.org/jira/browse/SPARK-26135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bjkonglu updated SPARK-26135: - Environment: JDK: 1.8.0_151 Scala: 2.11.8 Hadoop: 2.7.1 Spark: 2.3.1 was: JDK: 1.8.0_151 Scala: 2.11.8 Hadoop: 2.7.1 Spark: 2.3.1 > Structured Streaming reporting metrics programmatically using asynchronous > APIs can't get all queries metrics > - > > Key: SPARK-26135 > URL: https://issues.apache.org/jira/browse/SPARK-26135 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.3.1 > Environment: JDK: 1.8.0_151 > Scala: 2.11.8 > Hadoop: 2.7.1 > Spark: 2.3.1 >Reporter: bjkonglu >Priority: Major > > h3. Background > When I use Structured Streaming handle real-time data, I also want to know > the streaming application metrics, for example > prcessedRowsPerSecond、inputRowsPerSeconds etc. So I report metrics > programmatically using asynchronous APIs. > {code:java} > val spark: SparkSession = ... > spark.streams.addListener(new StreamingQueryListener() { > override def onQueryStarted(queryStarted: QueryStartedEvent): Unit = { > println("Query started: " + queryStarted.id) > } > override def onQueryTerminated(queryTerminated: QueryTerminatedEvent): > Unit = { > println("Query terminated: " + queryTerminated.id) > } > override def onQueryProgress(queryProgress: QueryProgressEvent): Unit = { > println("Query made progress: " + queryProgress.progress) > } > }) > {code} > h3. Questions > When the streaming application has a single query, asynchronous APIs work > well. But when the streaming application has many queries, asynchronous APIs > can't report metrics exactly, some queries can report well, some queries > report delay and metrics number lower. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26135) Structured Streaming reporting metrics programmatically using asynchronous APIs can't get all queries metrics
[ https://issues.apache.org/jira/browse/SPARK-26135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bjkonglu updated SPARK-26135: - Environment: h3. was: h3. > Structured Streaming reporting metrics programmatically using asynchronous > APIs can't get all queries metrics > - > > Key: SPARK-26135 > URL: https://issues.apache.org/jira/browse/SPARK-26135 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.3.1 > Environment: h3. > > >Reporter: bjkonglu >Priority: Major > > h3. Background > When I use Structured Streaming handle real-time data, I also want to know > the streaming application metrics, for example > prcessedRowsPerSecond、inputRowsPerSeconds etc. So I report metrics > programmatically using asynchronous APIs. > {code:java} > val spark: SparkSession = ... > spark.streams.addListener(new StreamingQueryListener() { > override def onQueryStarted(queryStarted: QueryStartedEvent): Unit = { > println("Query started: " + queryStarted.id) > } > override def onQueryTerminated(queryTerminated: QueryTerminatedEvent): > Unit = { > println("Query terminated: " + queryTerminated.id) > } > override def onQueryProgress(queryProgress: QueryProgressEvent): Unit = { > println("Query made progress: " + queryProgress.progress) > } > }) > {code} > h3. Questions > When the streaming application has a single query, asynchronous APIs work > well. But when the streaming application has many queries, asynchronous APIs > can't report metrics exactly, some queries can report well, some queries > report delay and metrics number lower. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26135) Structured Streaming reporting metrics programmatically using asynchronous APIs can't get all queries metrics
bjkonglu created SPARK-26135: Summary: Structured Streaming reporting metrics programmatically using asynchronous APIs can't get all queries metrics Key: SPARK-26135 URL: https://issues.apache.org/jira/browse/SPARK-26135 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 2.3.1 Environment: h3. Reporter: bjkonglu h3. Background When I use Structured Streaming handle real-time data, I also want to know the streaming application metrics, for example prcessedRowsPerSecond、inputRowsPerSeconds etc. So I report metrics programmatically using asynchronous APIs. {code:java} val spark: SparkSession = ... spark.streams.addListener(new StreamingQueryListener() { override def onQueryStarted(queryStarted: QueryStartedEvent): Unit = { println("Query started: " + queryStarted.id) } override def onQueryTerminated(queryTerminated: QueryTerminatedEvent): Unit = { println("Query terminated: " + queryTerminated.id) } override def onQueryProgress(queryProgress: QueryProgressEvent): Unit = { println("Query made progress: " + queryProgress.progress) } }) {code} h3. Questions When the streaming application has a single query, asynchronous APIs work well. But when the streaming application has many queries, asynchronous APIs can't report metrics exactly, some queries can report well, some queries report delay and metrics number lower. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24627) [Spark2.3.0] After HDFS Token expire kinit not able to submit job using beeline
[ https://issues.apache.org/jira/browse/SPARK-24627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ABHISHEK KUMAR GUPTA updated SPARK-24627: - Environment: OS: SUSE11 Spark Version: 2.3.0 was: OS: SUSE11 Spark Version: 2.3.0 Hadoop: 2.8.3 > [Spark2.3.0] After HDFS Token expire kinit not able to submit job using > beeline > --- > > Key: SPARK-24627 > URL: https://issues.apache.org/jira/browse/SPARK-24627 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: OS: SUSE11 > Spark Version: 2.3.0 > >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > Steps: > beeline session was active. > 1.Launch spark-beeline > 2. create table alt_s1 (time timestamp, name string, isright boolean, > datetoday date, num binary, height double, score float, decimaler > decimal(10,0), id tinyint, age int, license bigint, length smallint) row > format delimited fields terminated by ','; > 3. load data local inpath '/opt/typeddata60.txt' into table alt_s1; > 4. show tables;( Table listed successfully ) > 5. select * from alt_s1; > Throws HDFS_DELEGATION_TOKEN Exception > 0: jdbc:hive2://10.18.18.214:23040/default> select * from alt_s1; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 1 in stage 22.0 failed 4 times, most recent failure: Lost task 1.3 in > stage 22.0 (TID 106, blr123110, executor 1): > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 7 for spark) can't be found in cache > at org.apache.hadoop.ipc.Client.call(Client.java:1475) > at org.apache.hadoop.ipc.Client.call(Client.java:1412) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255) > at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy16.getBlockLocations(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1226) > at > org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213) > at > org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:306) > at > org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272) > at > org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:264) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526) > at > org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304) > at > org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) > at > org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109) > at > org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) > at > org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:257) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:256) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:214) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > **Note: Even after kinit spark/hadoop token is not getting renewed.** > Now Launch spark sql session ( Select * from alt_s1 ) is successful. > 1. Launch spark-sql > 2.spark-sql> select * from alt_s1; > 2018-06-22 14:24:04 INFO HiveMetaStore:746 - 0: get_table : db=test_one > tbl=alt_s1 > 2018-06-22 14:24:04 INFO audit:371 - ugi=spark/had...@hadoop.com > ip=unknown-ip-addr cmd=get_table : db=test_one tbl=alt_s1 > 2018-06-22 14:24:04 INFO
[jira] [Created] (SPARK-26134) Upgrading Hadoop to 2.7.4 to fix java.version problem
Takanobu Asanuma created SPARK-26134: Summary: Upgrading Hadoop to 2.7.4 to fix java.version problem Key: SPARK-26134 URL: https://issues.apache.org/jira/browse/SPARK-26134 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 2.4.0 Reporter: Takanobu Asanuma When I ran spark-shell on JDK11+28(2018-09-25), It failed with the error below. {noformat} Exception in thread "main" java.lang.ExceptionInInitializerError at org.apache.hadoop.util.StringUtils.(StringUtils.java:80) at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634) at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2427) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2427) at org.apache.spark.SecurityManager.(SecurityManager.scala:79) at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:359) at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:359) at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$9(SparkSubmit.scala:367) at scala.Option.map(Option.scala:146) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:367) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2 at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319) at java.base/java.lang.String.substring(String.java:1874) at org.apache.hadoop.util.Shell.(Shell.java:52) {noformat} This is a Hadoop issue that fails to parse some {{java.version}}. It has been fixed from Hadoop-2.7.4(see HADOOP-14586). Note, Hadoop-2.7.5 or upper have another problem with Spark (SPARK-25330). So upgrading to 2.7.4 would be fine for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26019) pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" in authenticate_and_accum_updates()
[ https://issues.apache.org/jira/browse/SPARK-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694188#comment-16694188 ] Hyukjin Kwon commented on SPARK-26019: -- [~irashid], Yup, maybe I rushed to take an action. I don't mind reopening this if it looks like an real issue to you, and if this issue will likely be resolved in any event at the end. I have to say ideally the issue should be open after enough analysis and when we're clear it's an issue, rather then blaming unrelated changes or asking how to test to other people. See how many people and committers spent their time on this issue. Also, I myself spent my time on this issue to check - I failed to reproduce and I failed to understand the analysis made here. For JIRA management, I have kept it in this way because 99% of such JIRAs are not resolved at the end or actually not an issue given my experience here in JIRA. For this one, it's okay. I'll leave this one to you. > pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" > in authenticate_and_accum_updates() > > > Key: SPARK-26019 > URL: https://issues.apache.org/jira/browse/SPARK-26019 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2, 2.4.0 >Reporter: Ruslan Dautkhanov >Priority: Major > > Started happening after 2.3.1 -> 2.3.2 upgrade. > > {code:python} > Exception happened during processing of request from ('127.0.0.1', 43418) > > Traceback (most recent call last): > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 290, in _handle_request_noblock > self.process_request(request, client_address) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 318, in process_request > self.finish_request(request, client_address) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 331, in finish_request > self.RequestHandlerClass(request, client_address, self) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 652, in __init__ > self.handle() > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 263, in handle > poll(authenticate_and_accum_updates) > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 238, in poll > if func(): > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 251, in authenticate_and_accum_updates > received_token = self.rfile.read(len(auth_token)) > TypeError: object of type 'NoneType' has no len() > > {code} > > Error happens here: > https://github.com/apache/spark/blob/cb90617f894fd51a092710271823ec7d1cd3a668/python/pyspark/accumulators.py#L254 > The PySpark code was just running a simple pipeline of > binary_rdd = sc.binaryRecords(full_file_path, record_length).map(lambda .. ) > and then converting it to a dataframe and running a count on it. > It seems error is flaky - on next rerun it didn't happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26133) Remove deprecated OneHotEncoder and rename OneHotEncoderEstimator to OneHotEncoder
[ https://issues.apache.org/jira/browse/SPARK-26133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26133: Assignee: (was: Apache Spark) > Remove deprecated OneHotEncoder and rename OneHotEncoderEstimator to > OneHotEncoder > -- > > Key: SPARK-26133 > URL: https://issues.apache.org/jira/browse/SPARK-26133 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Priority: Major > > We have deprecated OneHotEncoder at Spark 2.3.0 and introduced > OneHotEncoderEstimator. At 3.0.0, we remove deprecated OneHotEncoder and > rename OneHotEncoderEstimator to OneHotEncoder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26133) Remove deprecated OneHotEncoder and rename OneHotEncoderEstimator to OneHotEncoder
[ https://issues.apache.org/jira/browse/SPARK-26133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26133: Assignee: Apache Spark > Remove deprecated OneHotEncoder and rename OneHotEncoderEstimator to > OneHotEncoder > -- > > Key: SPARK-26133 > URL: https://issues.apache.org/jira/browse/SPARK-26133 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark >Priority: Major > > We have deprecated OneHotEncoder at Spark 2.3.0 and introduced > OneHotEncoderEstimator. At 3.0.0, we remove deprecated OneHotEncoder and > rename OneHotEncoderEstimator to OneHotEncoder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26133) Remove deprecated OneHotEncoder and rename OneHotEncoderEstimator to OneHotEncoder
[ https://issues.apache.org/jira/browse/SPARK-26133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694169#comment-16694169 ] Apache Spark commented on SPARK-26133: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/23100 > Remove deprecated OneHotEncoder and rename OneHotEncoderEstimator to > OneHotEncoder > -- > > Key: SPARK-26133 > URL: https://issues.apache.org/jira/browse/SPARK-26133 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Priority: Major > > We have deprecated OneHotEncoder at Spark 2.3.0 and introduced > OneHotEncoderEstimator. At 3.0.0, we remove deprecated OneHotEncoder and > rename OneHotEncoderEstimator to OneHotEncoder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26133) Remove deprecated OneHotEncoder and rename OneHotEncoderEstimator to OneHotEncoder
Liang-Chi Hsieh created SPARK-26133: --- Summary: Remove deprecated OneHotEncoder and rename OneHotEncoderEstimator to OneHotEncoder Key: SPARK-26133 URL: https://issues.apache.org/jira/browse/SPARK-26133 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.0.0 Reporter: Liang-Chi Hsieh We have deprecated OneHotEncoder at Spark 2.3.0 and introduced OneHotEncoderEstimator. At 3.0.0, we remove deprecated OneHotEncoder and rename OneHotEncoderEstimator to OneHotEncoder. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21121) Set up StorageLevel for CACHE TABLE command
[ https://issues.apache.org/jira/browse/SPARK-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-21121. - Resolution: Duplicate Fixed by [SPARK-25269|https://issues.apache.org/jira/browse/SPARK-25269]. > Set up StorageLevel for CACHE TABLE command > --- > > Key: SPARK-21121 > URL: https://issues.apache.org/jira/browse/SPARK-21121 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.1.1 >Reporter: Oleg Danilov >Priority: Minor > > Currently, "CACHE TABLE" always uses the default MEMORY_AND_DISK storage > level. We can add a possibility to specify it using variable, let say, > spark.sql.inMemoryColumnarStorage.level. It will give user a chance to fit > data into the memory with using MEMORY_AND_DISK_SER storage level. > Going to submit PR for this change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23178) Kryo Unsafe problems with count distinct from cache
[ https://issues.apache.org/jira/browse/SPARK-23178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694120#comment-16694120 ] Yuming Wang commented on SPARK-23178: - Cloud you try spark-2.4.0 or master branch? we have upgraded kryo to 4.0.2. > Kryo Unsafe problems with count distinct from cache > --- > > Key: SPARK-23178 > URL: https://issues.apache.org/jira/browse/SPARK-23178 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0, 2.2.1 >Reporter: KIryl Sultanau >Priority: Minor > Attachments: Unsafe-issue.png, Unsafe-off.png > > > Spark incorrectly process cached data with Kryo & Unsafe options. > Distinct count from cache doesn't work correctly. Example available below: > {quote}val spark = SparkSession > .builder > .appName("unsafe-issue") > .master("local[*]") > .config("spark.serializer", > "org.apache.spark.serializer.KryoSerializer") > .config("spark.kryo.unsafe", "true") > .config("spark.kryo.registrationRequired", "false") > .getOrCreate() > val devicesDF = spark.read.format("csv") > .option("header", "true") > .option("delimiter", "\t") > .load("/data/Devices.tsv").cache() > val gatewaysDF = spark.read.format("csv") > .option("header", "true") > .option("delimiter", "\t") > .load("/data/Gateways.tsv").cache() > val devJoinedDF = devicesDF.join(gatewaysDF, Seq("GatewayId"), > "inner").cache() > devJoinedDF.printSchema() > println(devJoinedDF.count()) > println(devJoinedDF.select("DeviceId").distinct().count()) > println(devJoinedDF.groupBy("DeviceId").count().filter("count>1").count()) > println(devJoinedDF.groupBy("DeviceId").count().filter("count=1").count()) > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24222) [Cache Column level is not supported in 2.3]
[ https://issues.apache.org/jira/browse/SPARK-24222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-24222. - Resolution: Invalid > [Cache Column level is not supported in 2.3] > > > Key: SPARK-24222 > URL: https://issues.apache.org/jira/browse/SPARK-24222 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > Steps: > # Create table s3; > # Insert data in S3; > # Execute cache column label as below > # cache select name,num,height from s3 where length=8; > # Throws Error as below > > Error in query: > mismatched input 'select' expecting {'TABLE', 'LAZY'}(line 1, pos 6) > == SQL == > cache select name,num,height from s3 where length=8 > Table level caching is supported like > cache table s3; -- Success -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24222) [Cache Column level is not supported in 2.3]
[ https://issues.apache.org/jira/browse/SPARK-24222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694116#comment-16694116 ] Yuming Wang commented on SPARK-24222: - Please use _{{cache table cache1 as select name,num,height from s3 where length=8}}_ to cache. > [Cache Column level is not supported in 2.3] > > > Key: SPARK-24222 > URL: https://issues.apache.org/jira/browse/SPARK-24222 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > Steps: > # Create table s3; > # Insert data in S3; > # Execute cache column label as below > # cache select name,num,height from s3 where length=8; > # Throws Error as below > > Error in query: > mismatched input 'select' expecting {'TABLE', 'LAZY'}(line 1, pos 6) > == SQL == > cache select name,num,height from s3 where length=8 > Table level caching is supported like > cache table s3; -- Success -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26122) Support encoding for multiLine in CSV datasource
[ https://issues.apache.org/jira/browse/SPARK-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-26122. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23091 [https://github.com/apache/spark/pull/23091] > Support encoding for multiLine in CSV datasource > > > Key: SPARK-26122 > URL: https://issues.apache.org/jira/browse/SPARK-26122 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > Currently, CSV datasource is not able to read CSV files in different encoding > when multiLine is enabled. The ticket aims to support the encoding CSV > options in the mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26120) Fix a streaming query leak in Structured Streaming R tests
[ https://issues.apache.org/jira/browse/SPARK-26120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-26120. -- Resolution: Fixed Fix Version/s: 3.0.0 2.4.1 fixed in https://github.com/apache/spark/pull/23089 > Fix a streaming query leak in Structured Streaming R tests > -- > > Key: SPARK-26120 > URL: https://issues.apache.org/jira/browse/SPARK-26120 > Project: Spark > Issue Type: Test > Components: SparkR, Structured Streaming, Tests >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > Fix For: 2.4.1, 3.0.0 > > > "Specify a schema by using a DDL-formatted string when reading" doesn't stop > the streaming query before stopping Spark. It causes the following annoying > logs. > {code} > Exception in thread "stream execution thread for [id = > 186dad10-e87f-4155-8119-00e0e63bbc1a, runId = > 2c0cc158-410b-442f-ac36-20f80ec429b1]" Exception in thread "stream execution > thread for people3 [id = ffa6136d-fe7b-4777-aa47-b0cb64d07ea4, runId = > 644b888e-9cce-4a09-bb5e-2fb122796c19]" org.apache.spark.SparkException: > Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76) > at > org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108) > at > org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204) > Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already > stopped. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91) > ... 7 more > org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76) > at > org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108) > at > org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204) > Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already > stopped. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91) > ... 7 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Assigned] (SPARK-26122) Support encoding for multiLine in CSV datasource
[ https://issues.apache.org/jira/browse/SPARK-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-26122: Assignee: Maxim Gekk > Support encoding for multiLine in CSV datasource > > > Key: SPARK-26122 > URL: https://issues.apache.org/jira/browse/SPARK-26122 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > Currently, CSV datasource is not able to read CSV files in different encoding > when multiLine is enabled. The ticket aims to support the encoding CSV > options in the mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26124) Update plugins, including MiMa
[ https://issues.apache.org/jira/browse/SPARK-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-26124. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23087 [https://github.com/apache/spark/pull/23087] > Update plugins, including MiMa > -- > > Key: SPARK-26124 > URL: https://issues.apache.org/jira/browse/SPARK-26124 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > Fix For: 3.0.0 > > > For Spark 3, we should update plugins to their latest version where possible, > to pick up miscellaneous fixes. In particular we can update MiMa to 0.3.0, > though that introduces some new errors on old changes due to some changes in > MiMa. > Most SBT plugins can't really be updated further without updating to SBT 1.x, > and that will require some changes to the build, and it generally seems like > all of these new versions are for Scala 2.12+, including the new zinc. That > will probably be a bigger change but only after deciding to drop Scala 2.11 > support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26132) Remove support for Scala 2.11 in Spark 3.0.0
[ https://issues.apache.org/jira/browse/SPARK-26132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693865#comment-16693865 ] Apache Spark commented on SPARK-26132: -- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/23098 > Remove support for Scala 2.11 in Spark 3.0.0 > > > Key: SPARK-26132 > URL: https://issues.apache.org/jira/browse/SPARK-26132 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Affects Versions: 3.0.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Major > > Per some discussion on the mailing list, we are_considering_ formally not > supporting Scala 2.11 in Spark 3.0. This JIRA tracks that discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26132) Remove support for Scala 2.11 in Spark 3.0.0
[ https://issues.apache.org/jira/browse/SPARK-26132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693866#comment-16693866 ] Apache Spark commented on SPARK-26132: -- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/23098 > Remove support for Scala 2.11 in Spark 3.0.0 > > > Key: SPARK-26132 > URL: https://issues.apache.org/jira/browse/SPARK-26132 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Affects Versions: 3.0.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Major > > Per some discussion on the mailing list, we are_considering_ formally not > supporting Scala 2.11 in Spark 3.0. This JIRA tracks that discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26132) Remove support for Scala 2.11 in Spark 3.0.0
[ https://issues.apache.org/jira/browse/SPARK-26132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26132: Assignee: Apache Spark (was: Sean Owen) > Remove support for Scala 2.11 in Spark 3.0.0 > > > Key: SPARK-26132 > URL: https://issues.apache.org/jira/browse/SPARK-26132 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Affects Versions: 3.0.0 >Reporter: Sean Owen >Assignee: Apache Spark >Priority: Major > > Per some discussion on the mailing list, we are_considering_ formally not > supporting Scala 2.11 in Spark 3.0. This JIRA tracks that discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25954) Upgrade to Kafka 2.1.0
[ https://issues.apache.org/jira/browse/SPARK-25954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25954: Assignee: Apache Spark > Upgrade to Kafka 2.1.0 > -- > > Key: SPARK-25954 > URL: https://issues.apache.org/jira/browse/SPARK-25954 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > Kafka 2.1.0 vote passed. Since this includes official KAFKA-7264 JDK 11 > support, we had better use that. > - > https://lists.apache.org/thread.html/9f487094491e512b556a1c9c3c6034ac642b088e3f797e3d192ebc9d@%3Cdev.kafka.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25954) Upgrade to Kafka 2.1.0
[ https://issues.apache.org/jira/browse/SPARK-25954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693864#comment-16693864 ] Apache Spark commented on SPARK-25954: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/23099 > Upgrade to Kafka 2.1.0 > -- > > Key: SPARK-25954 > URL: https://issues.apache.org/jira/browse/SPARK-25954 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Kafka 2.1.0 vote passed. Since this includes official KAFKA-7264 JDK 11 > support, we had better use that. > - > https://lists.apache.org/thread.html/9f487094491e512b556a1c9c3c6034ac642b088e3f797e3d192ebc9d@%3Cdev.kafka.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26132) Remove support for Scala 2.11 in Spark 3.0.0
[ https://issues.apache.org/jira/browse/SPARK-26132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26132: Assignee: Sean Owen (was: Apache Spark) > Remove support for Scala 2.11 in Spark 3.0.0 > > > Key: SPARK-26132 > URL: https://issues.apache.org/jira/browse/SPARK-26132 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Affects Versions: 3.0.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Major > > Per some discussion on the mailing list, we are_considering_ formally not > supporting Scala 2.11 in Spark 3.0. This JIRA tracks that discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25954) Upgrade to Kafka 2.1.0
[ https://issues.apache.org/jira/browse/SPARK-25954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25954: Assignee: (was: Apache Spark) > Upgrade to Kafka 2.1.0 > -- > > Key: SPARK-25954 > URL: https://issues.apache.org/jira/browse/SPARK-25954 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Kafka 2.1.0 vote passed. Since this includes official KAFKA-7264 JDK 11 > support, we had better use that. > - > https://lists.apache.org/thread.html/9f487094491e512b556a1c9c3c6034ac642b088e3f797e3d192ebc9d@%3Cdev.kafka.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26132) Remove support for Scala 2.11 in Spark 3.0.0
Sean Owen created SPARK-26132: - Summary: Remove support for Scala 2.11 in Spark 3.0.0 Key: SPARK-26132 URL: https://issues.apache.org/jira/browse/SPARK-26132 Project: Spark Issue Type: Improvement Components: Build, Spark Core Affects Versions: 3.0.0 Reporter: Sean Owen Assignee: Sean Owen Per some discussion on the mailing list, we are_considering_ formally not supporting Scala 2.11 in Spark 3.0. This JIRA tracks that discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25954) Upgrade to Kafka 2.1.0
[ https://issues.apache.org/jira/browse/SPARK-25954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25954: -- Description: Kafka 2.1.0 vote passed. Since this includes official KAFKA-7264 JDK 11 support, we had better use that. - https://lists.apache.org/thread.html/9f487094491e512b556a1c9c3c6034ac642b088e3f797e3d192ebc9d@%3Cdev.kafka.apache.org%3E was: Kafka 2.1.0 RC0 is started. Since this includes official KAFKA-7264 JDK 11 support, we had better use that. - https://lists.apache.org/thread.html/8288f0afdfed4d329f1a8338320b6e24e7684a0593b4bbd6f1b79101@%3Cdev.kafka.apache.org%3E > Upgrade to Kafka 2.1.0 > -- > > Key: SPARK-25954 > URL: https://issues.apache.org/jira/browse/SPARK-25954 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Kafka 2.1.0 vote passed. Since this includes official KAFKA-7264 JDK 11 > support, we had better use that. > - > https://lists.apache.org/thread.html/9f487094491e512b556a1c9c3c6034ac642b088e3f797e3d192ebc9d@%3Cdev.kafka.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25954) Upgrade to Kafka 2.1.0
[ https://issues.apache.org/jira/browse/SPARK-25954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693849#comment-16693849 ] Dongjoon Hyun commented on SPARK-25954: --- I update the description because Kafka 2.1.0 vote passed today. > Upgrade to Kafka 2.1.0 > -- > > Key: SPARK-25954 > URL: https://issues.apache.org/jira/browse/SPARK-25954 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Kafka 2.1.0 vote passed. Since this includes official KAFKA-7264 JDK 11 > support, we had better use that. > - > https://lists.apache.org/thread.html/9f487094491e512b556a1c9c3c6034ac642b088e3f797e3d192ebc9d@%3Cdev.kafka.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26131) Remove sqlContext.conf from Spark SQL physical operators
Herman van Hovell created SPARK-26131: - Summary: Remove sqlContext.conf from Spark SQL physical operators Key: SPARK-26131 URL: https://issues.apache.org/jira/browse/SPARK-26131 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Herman van Hovell Assignee: Herman van Hovell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26130) Change Event Timeline Display Functionality on the Stages Page to use either REST API or data from other tables
Parth Gandhi created SPARK-26130: Summary: Change Event Timeline Display Functionality on the Stages Page to use either REST API or data from other tables Key: SPARK-26130 URL: https://issues.apache.org/jira/browse/SPARK-26130 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.0.0 Reporter: Parth Gandhi As per Jira https://issues.apache.org/jira/browse/SPARK-21809, Stages page will use datatables for performing Column sorting, searching, pagination etc. To support those datatables, we have changed the Stage page to use ajax calls to access the server API's. However, event timeline functionality on the stage page has not been updated to use the REST API or use data from the datatables dynamically to reconstruct the graphs at the Client end. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26084) AggregateExpression.references fails on unresolved expression trees
[ https://issues.apache.org/jira/browse/SPARK-26084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-26084. --- Resolution: Fixed Assignee: Simeon Simeonov Fix Version/s: 3.0.0 2.4.1 2.3.3 > AggregateExpression.references fails on unresolved expression trees > --- > > Key: SPARK-26084 > URL: https://issues.apache.org/jira/browse/SPARK-26084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Simeon Simeonov >Assignee: Simeon Simeonov >Priority: Major > Labels: aggregate, regression, sql > Fix For: 2.3.3, 2.4.1, 3.0.0 > > > [SPARK-18394|https://issues.apache.org/jira/browse/SPARK-18394] introduced a > stable ordering in {{AttributeSet.toSeq}} using expression IDs > ([PR-18959|https://github.com/apache/spark/pull/18959/files#diff-75576f0ec7f9d8b5032000245217d233R128]) > without noticing that {{AggregateExpression.references}} used > {{AttributeSet.toSeq}} as a shortcut > ([link|https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala#L132]). > The net result is that {{AggregateExpression.references}} fails for > unresolved aggregate functions. > {code:scala} > org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression( > org.apache.spark.sql.catalyst.expressions.aggregate.Sum(('x + 'y).expr), > mode = org.apache.spark.sql.catalyst.expressions.aggregate.Complete, > isDistinct = false > ).references > {code} > fails with > {code:scala} > org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to > exprId on unresolved object, tree: 'y > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:104) > at > org.apache.spark.sql.catalyst.expressions.AttributeSet$$anonfun$toSeq$2.apply(AttributeSet.scala:128) > at > org.apache.spark.sql.catalyst.expressions.AttributeSet$$anonfun$toSeq$2.apply(AttributeSet.scala:128) > at scala.math.Ordering$$anon$5.compare(Ordering.scala:122) > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355) > at java.util.TimSort.sort(TimSort.java:220) > at java.util.Arrays.sort(Arrays.java:1438) > at scala.collection.SeqLike$class.sorted(SeqLike.scala:648) > at scala.collection.AbstractSeq.sorted(Seq.scala:41) > at scala.collection.SeqLike$class.sortBy(SeqLike.scala:623) > at scala.collection.AbstractSeq.sortBy(Seq.scala:41) > at > org.apache.spark.sql.catalyst.expressions.AttributeSet.toSeq(AttributeSet.scala:128) > at > org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.references(interfaces.scala:201) > {code} > The solution is to avoid calling {{toSeq}} as ordering is not important in > {{references}} and simplify (and speed up) the implementation to something > like > {code:scala} > mode match { > case Partial | Complete => aggregateFunction.references > case PartialMerge | Final => > AttributeSet(aggregateFunction.aggBufferAttributes) > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26043) Make SparkHadoopUtil private to Spark
[ https://issues.apache.org/jira/browse/SPARK-26043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693748#comment-16693748 ] Apache Spark commented on SPARK-26043: -- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/23097 > Make SparkHadoopUtil private to Spark > - > > Key: SPARK-26043 > URL: https://issues.apache.org/jira/browse/SPARK-26043 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Marcelo Vanzin >Assignee: Sean Owen >Priority: Minor > Labels: release-notes > Fix For: 3.0.0 > > > This API contains a few small helper methods used internally by Spark, mostly > related to Hadoop configs and kerberos. > It's been historically marked as "DeveloperApi". But in reality it's not very > useful for others, and changes a lot to be considered a stable API. Better to > just make it private to Spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26118) Make Jetty's requestHeaderSize configurable in Spark
[ https://issues.apache.org/jira/browse/SPARK-26118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Zsolt Piros updated SPARK-26118: --- Issue Type: New Feature (was: Bug) > Make Jetty's requestHeaderSize configurable in Spark > > > Key: SPARK-26118 > URL: https://issues.apache.org/jira/browse/SPARK-26118 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > For long authorization fields the request header size could be over the > default limit (8192 bytes) and in this case Jetty replies HTTP 413 (Request > Entity Too Large). > This issue may occur if the user is a member of many Active Directory user > groups. > The HTTP request to the server contains the Kerberos token in the > WWW-Authenticate header. The header size increases together with the number > of user groups. > Currently there is no way in Spark to override this limit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26019) pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" in authenticate_and_accum_updates()
[ https://issues.apache.org/jira/browse/SPARK-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693664#comment-16693664 ] Imran Rashid commented on SPARK-26019: -- Yeah I agree with [~viirya]'s analysis, my suggestion was from just a quick glance at the code. I don't think swapping those lines is likely to help at all ... but I can't come up with any other explanation for how it does happen. From SPARK-26113, it doesn't seem particular to the cloudera distribution, but we'll poke at it a bit. SPARK-26113 also makes it sound like a race as it works after the initial failure ... [~Tagar] are you running a pyspark shell, or with spark-submit? the token generation is different in those two cases, so that might matter (though I don't see how yet ...) [~hyukjin.kwon] for errors which appear to be from a race, I don't think we should close immediately because we can't reproduce it, as it can be tricky to reproduce and involve something about the user environment that we dont' immediately understand, that doesn't mean its not a real issue. (I absolutely agree that if it appears to be related to a specific distribution, it doesn't belong as an issue here). > pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" > in authenticate_and_accum_updates() > > > Key: SPARK-26019 > URL: https://issues.apache.org/jira/browse/SPARK-26019 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2, 2.4.0 >Reporter: Ruslan Dautkhanov >Priority: Major > > Started happening after 2.3.1 -> 2.3.2 upgrade. > > {code:python} > Exception happened during processing of request from ('127.0.0.1', 43418) > > Traceback (most recent call last): > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 290, in _handle_request_noblock > self.process_request(request, client_address) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 318, in process_request > self.finish_request(request, client_address) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 331, in finish_request > self.RequestHandlerClass(request, client_address, self) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 652, in __init__ > self.handle() > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 263, in handle > poll(authenticate_and_accum_updates) > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 238, in poll > if func(): > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 251, in authenticate_and_accum_updates > received_token = self.rfile.read(len(auth_token)) > TypeError: object of type 'NoneType' has no len() > > {code} > > Error happens here: > https://github.com/apache/spark/blob/cb90617f894fd51a092710271823ec7d1cd3a668/python/pyspark/accumulators.py#L254 > The PySpark code was just running a simple pipeline of > binary_rdd = sc.binaryRecords(full_file_path, record_length).map(lambda .. ) > and then converting it to a dataframe and running a count on it. > It seems error is flaky - on next rerun it didn't happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26079) Flaky test: StreamingQueryListenersConfSuite
[ https://issues.apache.org/jira/browse/SPARK-26079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-26079. -- Resolution: Fixed Assignee: Marcelo Vanzin Fix Version/s: 3.0.0 2.4.1 fixed by https://github.com/apache/spark/pull/23050 > Flaky test: StreamingQueryListenersConfSuite > > > Key: SPARK-26079 > URL: https://issues.apache.org/jira/browse/SPARK-26079 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.4.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Minor > Fix For: 2.4.1, 3.0.0 > > > We've had this test fail a few times in our builds. > {noformat} > org.scalatest.exceptions.TestFailedException: null equaled null > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) > at > org.apache.spark.sql.streaming.StreamingQueryListenersConfSuite$$anonfun$1.apply(StreamingQueryListenersConfSuite.scala:45) > at > org.apache.spark.sql.streaming.StreamingQueryListenersConfSuite$$anonfun$1.apply(StreamingQueryListenersConfSuite.scala:38) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > {noformat} > You can reproduce it reliably by adding a sleep in the test listener. Fix > coming up. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26118) Make Jetty's requestHeaderSize configurable in Spark
[ https://issues.apache.org/jira/browse/SPARK-26118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Zsolt Piros updated SPARK-26118: --- Issue Type: Bug (was: Improvement) > Make Jetty's requestHeaderSize configurable in Spark > > > Key: SPARK-26118 > URL: https://issues.apache.org/jira/browse/SPARK-26118 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > For long authorization fields the request header size could be over the > default limit (8192 bytes) and in this case Jetty replies HTTP 413 (Request > Entity Too Large). > This issue may occur if the user is a member of many Active Directory user > groups. > The HTTP request to the server contains the Kerberos token in the > WWW-Authenticate header. The header size increases together with the number > of user groups. > Currently there is no way in Spark to override this limit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26129) Instrumentation for query planning time
[ https://issues.apache.org/jira/browse/SPARK-26129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693445#comment-16693445 ] Apache Spark commented on SPARK-26129: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/23096 > Instrumentation for query planning time > --- > > Key: SPARK-26129 > URL: https://issues.apache.org/jira/browse/SPARK-26129 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0 >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Major > > We currently don't have good visibility into query planning time (analysis vs > optimization vs physical planning). This patch adds a simple utility to track > the runtime of various rules and various planning phases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26129) Instrumentation for query planning time
[ https://issues.apache.org/jira/browse/SPARK-26129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26129: Assignee: Apache Spark (was: Reynold Xin) > Instrumentation for query planning time > --- > > Key: SPARK-26129 > URL: https://issues.apache.org/jira/browse/SPARK-26129 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0 >Reporter: Reynold Xin >Assignee: Apache Spark >Priority: Major > > We currently don't have good visibility into query planning time (analysis vs > optimization vs physical planning). This patch adds a simple utility to track > the runtime of various rules and various planning phases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26129) Instrumentation for query planning time
[ https://issues.apache.org/jira/browse/SPARK-26129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26129: Assignee: Reynold Xin (was: Apache Spark) > Instrumentation for query planning time > --- > > Key: SPARK-26129 > URL: https://issues.apache.org/jira/browse/SPARK-26129 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0 >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Major > > We currently don't have good visibility into query planning time (analysis vs > optimization vs physical planning). This patch adds a simple utility to track > the runtime of various rules and various planning phases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26129) Instrumentation for query planning time
[ https://issues.apache.org/jira/browse/SPARK-26129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693447#comment-16693447 ] Apache Spark commented on SPARK-26129: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/23096 > Instrumentation for query planning time > --- > > Key: SPARK-26129 > URL: https://issues.apache.org/jira/browse/SPARK-26129 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0 >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Major > > We currently don't have good visibility into query planning time (analysis vs > optimization vs physical planning). This patch adds a simple utility to track > the runtime of various rules and various planning phases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26129) Instrumentation for query planning time
Reynold Xin created SPARK-26129: --- Summary: Instrumentation for query planning time Key: SPARK-26129 URL: https://issues.apache.org/jira/browse/SPARK-26129 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.4.0 Reporter: Reynold Xin Assignee: Reynold Xin We currently don't have good visibility into query planning time (analysis vs optimization vs physical planning). This patch adds a simple utility to track the runtime of various rules and various planning phases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26116) Spark SQL - Sort when writing partitioned parquet leads to OOM errors
[ https://issues.apache.org/jira/browse/SPARK-26116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693393#comment-16693393 ] Yuming Wang commented on SPARK-26116: - Please try to set spark.executor.memoryOverhead=6G or spark.executor.extraJavaOptions='-XX:MaxDirectMemorySize=4g'. > Spark SQL - Sort when writing partitioned parquet leads to OOM errors > - > > Key: SPARK-26116 > URL: https://issues.apache.org/jira/browse/SPARK-26116 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 >Reporter: Pierre Lienhart >Priority: Major > > When writing partitioned parquet using {{partitionBy}}, it looks like Spark > sorts each partition before writing but this sort consumes a huge amount of > memory compared to the size of the data. The executors can then go OOM and > get killed by YARN. As a consequence, it also forces to provision huge > amounts of memory compared to the data to be written. > Error messages found in the Spark UI are like the following : > {code:java} > Spark UI description of failure : Job aborted due to stage failure: Task 169 > in stage 2.0 failed 1 times, most recent failure: Lost task 169.0 in stage > 2.0 (TID 98, x.xx.x.xx, executor 1): ExecutorLostFailure > (executor 1 exited caused by one of the running tasks) Reason: Container > killed by YARN for exceeding memory limits. 8.1 GB of 8 GB physical memory > used. Consider boosting spark.yarn.executor.memoryOverhead. > {code} > > {code:java} > Job aborted due to stage failure: Task 66 in stage 4.0 failed 1 times, most > recent failure: Lost task 66.0 in stage 4.0 (TID 56, xxx.x.x.xx, > executor 1): org.apache.spark.SparkException: Task failed while writing rows > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:204) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.OutOfMemoryError: error while calling spill() on > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@75194804 : > /app/hadoop/yarn/local/usercache/at053351/appcache/application_1537536072724_17039/blockmgr-a4ba7d59-e780-4385-99b4-a4c4fe95a1ec/25/temp_local_a542a412-5845-45d2-9302-bbf5ee4113ad > (No such file or directory) > at > org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:188) > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:254) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:92) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.growPointerArrayIfNecessary(UnsafeExternalSorter.java:347) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertKVRecord(UnsafeExternalSorter.java:425) > at > org.apache.spark.sql.execution.UnsafeKVExternalSorter.insertKV(UnsafeKVExternalSorter.java:160) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$DynamicPartitionWriteTask.execute(FileFormatWriter.scala:364) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1353) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193) > ... 8 more{code} > > In the stderr logs, we can see that huge amount of sort data (the partition > being sorted here is 250 MB when persisted into memory, deserialized) is > being spilled to the disk ({{INFO UnsafeExternalSorter:
[jira] [Assigned] (SPARK-26118) Make Jetty's requestHeaderSize configurable in Spark
[ https://issues.apache.org/jira/browse/SPARK-26118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-26118: Assignee: Attila Zsolt Piros > Make Jetty's requestHeaderSize configurable in Spark > > > Key: SPARK-26118 > URL: https://issues.apache.org/jira/browse/SPARK-26118 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > For long authorization fields the request header size could be over the > default limit (8192 bytes) and in this case Jetty replies HTTP 413 (Request > Entity Too Large). > This issue may occur if the user is a member of many Active Directory user > groups. > The HTTP request to the server contains the Kerberos token in the > WWW-Authenticate header. The header size increases together with the number > of user groups. > Currently there is no way in Spark to override this limit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26118) Make Jetty's requestHeaderSize configurable in Spark
[ https://issues.apache.org/jira/browse/SPARK-26118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-26118. -- Resolution: Fixed Fix Version/s: 2.4.1 3.0.0 Issue resolved by pull request 23090 [https://github.com/apache/spark/pull/23090] > Make Jetty's requestHeaderSize configurable in Spark > > > Key: SPARK-26118 > URL: https://issues.apache.org/jira/browse/SPARK-26118 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > Fix For: 3.0.0, 2.4.1 > > > For long authorization fields the request header size could be over the > default limit (8192 bytes) and in this case Jetty replies HTTP 413 (Request > Entity Too Large). > This issue may occur if the user is a member of many Active Directory user > groups. > The HTTP request to the server contains the Kerberos token in the > WWW-Authenticate header. The header size increases together with the number > of user groups. > Currently there is no way in Spark to override this limit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26127) Remove deprecated setters from tree regression and classification models
[ https://issues.apache.org/jira/browse/SPARK-26127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido updated SPARK-26127: Description: Many {{set***}} methods are present for the models of regression and classification trees. They are useless and deprecated since 2.1 and targeted to be removed in 3.0. So the JIRA tracks its removal. (was: The method {{setImpurity}} introduced in {{TreeRegressorParams}} and {{TreeClassifierParams}} is deprecated since 2.1 and it is targeted to be removed in 3.0. So the JIRA tracks its removal.) > Remove deprecated setters from tree regression and classification models > > > Key: SPARK-26127 > URL: https://issues.apache.org/jira/browse/SPARK-26127 > Project: Spark > Issue Type: Task > Components: ML >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Priority: Trivial > > Many {{set***}} methods are present for the models of regression and > classification trees. They are useless and deprecated since 2.1 and targeted > to be removed in 3.0. So the JIRA tracks its removal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25959) Difference in featureImportances results on computed vs saved models
[ https://issues.apache.org/jira/browse/SPARK-25959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693326#comment-16693326 ] Sean Owen commented on SPARK-25959: --- Yes 2.2 is all but EOL. I am worried about the binary incompatibility issue, and that's why I didn't back-port. Even if the incompatibility isn't in the apparent user-visible API, I wonder if it will cause problems at link time nonetheless. I didn't test it. Is it possible to submit a job compiled from master against an older cluster and just check that it doesn't fail? > Difference in featureImportances results on computed vs saved models > > > Key: SPARK-25959 > URL: https://issues.apache.org/jira/browse/SPARK-25959 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 2.2.0 >Reporter: Suraj Nayak >Assignee: Marco Gaido >Priority: Major > Fix For: 3.0.0 > > > I tried to implement GBT and found that the feature Importance computed while > the model was fit is different when the same model was saved into a storage > and loaded back. > > I also found that once the persistent model is loaded and saved back again > and loaded, the feature importance remains the same. > > Not sure if its bug while storing and reading the model first time or am > missing some parameter that need to be set before saving the model (thus > model is picking some defaults - causing feature importance to change) > > *Below is the test code:* > val testDF = Seq( > (1, 3, 2, 1, 1), > (3, 2, 1, 2, 0), > (2, 2, 1, 1, 0), > (3, 4, 2, 2, 0), > (2, 2, 1, 3, 1) > ).toDF("a", "b", "c", "d", "e") > val featureColumns = testDF.columns.filter(_ != "e") > // Assemble the features into a vector > val assembler = new > VectorAssembler().setInputCols(featureColumns).setOutputCol("features") > // Transform the data to get the feature data set > val featureDF = assembler.transform(testDF) > // Train a GBT model. > val gbt = new GBTClassifier() > .setLabelCol("e") > .setFeaturesCol("features") > .setMaxDepth(2) > .setMaxBins(5) > .setMaxIter(10) > .setSeed(10) > .fit(featureDF) > gbt.transform(featureDF).show(false) > // Write out the model > featureColumns.zip(gbt.featureImportances.toArray).sortBy(-_._2).take(20).foreach(println) > /* Prints > (d,0.5931875075767403) > (a,0.3747184548362353) > (b,0.03209403758702444) > (c,0.0) > */ > gbt.write.overwrite().save("file:///tmp/test123") > println("Reading model again") > val gbtload = GBTClassificationModel.load("file:///tmp/test123") > featureColumns.zip(gbtload.featureImportances.toArray).sortBy(-_._2).take(20).foreach(println) > /* > Prints > (d,0.6455841215290767) > (a,0.3316126797964181) > (b,0.022803198674505094) > (c,0.0) > */ > gbtload.write.overwrite().save("file:///tmp/test123_rewrite") > val gbtload2 = GBTClassificationModel.load("file:///tmp/test123_rewrite") > featureColumns.zip(gbtload2.featureImportances.toArray).sortBy(-_._2).take(20).foreach(println) > /* prints > (d,0.6455841215290767) > (a,0.3316126797964181) > (b,0.022803198674505094) > (c,0.0) > */ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26127) Remove deprecated setters from tree regression and classification models
[ https://issues.apache.org/jira/browse/SPARK-26127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido updated SPARK-26127: Summary: Remove deprecated setters from tree regression and classification models (was: Remove deprecated setImpurity from tree regression and classification models) > Remove deprecated setters from tree regression and classification models > > > Key: SPARK-26127 > URL: https://issues.apache.org/jira/browse/SPARK-26127 > Project: Spark > Issue Type: Task > Components: ML >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Priority: Trivial > > The method {{setImpurity}} introduced in {{TreeRegressorParams}} and > {{TreeClassifierParams}} is deprecated since 2.1 and it is targeted to be > removed in 3.0. So the JIRA tracks its removal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26076) Revise ambiguous error message from load-spark-env.sh
[ https://issues.apache.org/jira/browse/SPARK-26076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-26076. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23049 [https://github.com/apache/spark/pull/23049] > Revise ambiguous error message from load-spark-env.sh > - > > Key: SPARK-26076 > URL: https://issues.apache.org/jira/browse/SPARK-26076 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Trivial > Fix For: 3.0.0 > > > When I try to run scripts (e.g. `./sbin/start-history-server.sh -h` in latest > master, I got such error: > ``` > Presence of build for multiple Scala versions detected. > Either clean one of them or, export SPARK_SCALA_VERSION in spark-env.sh. > ``` > The error message is quite confusing and there is no `spark-env.sh` in our > code base. > As now with https://github.com/apache/spark/pull/22967, we can revise the > error message as following: > ``` > Presence of build for both scala versions(SCALA 2.11 and SCALA 2.12) detected. > Either clean one of them or, export SPARK_SCALA_VERSION=2.12 in > load-spark-env.sh. > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26076) Revise ambiguous error message from load-spark-env.sh
[ https://issues.apache.org/jira/browse/SPARK-26076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-26076: - Assignee: Gengliang Wang > Revise ambiguous error message from load-spark-env.sh > - > > Key: SPARK-26076 > URL: https://issues.apache.org/jira/browse/SPARK-26076 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Trivial > Fix For: 3.0.0 > > > When I try to run scripts (e.g. `./sbin/start-history-server.sh -h` in latest > master, I got such error: > ``` > Presence of build for multiple Scala versions detected. > Either clean one of them or, export SPARK_SCALA_VERSION in spark-env.sh. > ``` > The error message is quite confusing and there is no `spark-env.sh` in our > code base. > As now with https://github.com/apache/spark/pull/22967, we can revise the > error message as following: > ``` > Presence of build for both scala versions(SCALA 2.11 and SCALA 2.12) detected. > Either clean one of them or, export SPARK_SCALA_VERSION=2.12 in > load-spark-env.sh. > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16044) input_file_name() returns empty strings in data sources based on NewHadoopRDD.
[ https://issues.apache.org/jira/browse/SPARK-16044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693200#comment-16693200 ] Paul Praet commented on SPARK-16044: Still has issues. See https://issues.apache.org/jira/browse/SPARK-26128 > input_file_name() returns empty strings in data sources based on NewHadoopRDD. > -- > > Key: SPARK-16044 > URL: https://issues.apache.org/jira/browse/SPARK-16044 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 1.6.3, 2.0.0 > > > The issue is, {{input_file_name()}} function does not contain file paths when > data sources use {{NewHadoopRDD}}. This is currently only supported for > {{FileScanRDD}} and {{HadoopRDD}}. > To be clear, this does not affect Spark's internal data sources because > currently they all do not use {{NewHadoopRDD}}. > However, there are several datasources using this. For example, > > spark-redshift - > [here|https://github.com/databricks/spark-redshift/blob/cba5eee1ab79ae8f0fa9e668373a54d2b5babf6b/src/main/scala/com/databricks/spark/redshift/RedshiftRelation.scala#L149] > spark-xml - > [here|https://github.com/databricks/spark-xml/blob/master/src/main/scala/com/databricks/spark/xml/util/XmlFile.scala#L39-L47] > Currently, using this functions shows the output below: > {code} > +-+ > |input_file_name()| > +-+ > | | > | | > | | > | | > | | > | | > | | > | | > | | > | | > | | > +-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26128) filter breaks input_file_name
Paul Praet created SPARK-26128: -- Summary: filter breaks input_file_name Key: SPARK-26128 URL: https://issues.apache.org/jira/browse/SPARK-26128 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 2.3.2 Reporter: Paul Praet This works: {code:java} scala> spark.read.parquet("/tmp/newparquet").select(input_file_name).show(5,false) +-+ |input_file_name() | +-+ |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| |file:///tmp/newparquet/parquet-5-PT6H/junit/data/tenant=NA/year=2017/month=201704/day=20170406/hour=2017040618/data.eu-west-1b.290.PT6H.FINAL.parquet| +-+ {code} When adding a filter: {code:java} scala> spark.read.parquet("/tmp/newparquet").where("key.station='XYZ'").select(input_file_name()).show(5,false) +-+ |input_file_name()| +-+ | | | | | | | | | | +-+ {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23886) update query.status
[ https://issues.apache.org/jira/browse/SPARK-23886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693175#comment-16693175 ] Apache Spark commented on SPARK-23886: -- User 'gaborgsomogyi' has created a pull request for this issue: https://github.com/apache/spark/pull/23095 > update query.status > --- > > Key: SPARK-23886 > URL: https://issues.apache.org/jira/browse/SPARK-23886 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Jose Torres >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23897) Guava version
[ https://issues.apache.org/jira/browse/SPARK-23897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693076#comment-16693076 ] James Grinter commented on SPARK-23897: --- We also just bumped into CVE-2018-10237, as it's now started triggering the OWASP dependency checker in our Spark application builds because of the included Guava dependency. But I'm going to note that the Guava code itself does not use `AtomicDoubleArray` (one of the problematic classes) internally, and instantiates a `CompoundOrdering` object only via its `Ordering` collection class and `compound` method. Spark does not use `AtomicDoubleArray` but it *does* use `Ordering`. It doesn't invoke the `compound` method that would create a `CompoundOrdering` object. Someone else has asked about this specific CVE at https://issues.apache.org/jira/browse/SPARK-25762 > Guava version > - > > Key: SPARK-23897 > URL: https://issues.apache.org/jira/browse/SPARK-23897 > Project: Spark > Issue Type: Dependency upgrade > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Sercan Karaoglu >Priority: Minor > > Guava dependency version 14 is pretty old, needs to be updated to at least > 16, google cloud storage connector uses newer one which causes pretty popular > error with guava; "java.lang.NoSuchMethodError: > com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;" > and causes app to crash -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25762) Upgrade guava version in spark dependency lists due to CVE issue
[ https://issues.apache.org/jira/browse/SPARK-25762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693075#comment-16693075 ] James Grinter commented on SPARK-25762: --- We also just bumped into CVE-2018-10237, as it's now started triggering the OWASP dependency checker in our Spark application builds because of the included Guava dependency. But I'm going to note that the Guava code itself does not use `AtomicDoubleArray` (one of the problematic classes) internally, and instantiates a `CompoundOrdering` object only via its `Ordering` collection class and `compound` method. Spark does not use `AtomicDoubleArray` but it *does* use `Ordering`. It doesn't invoke the `compound` method that would create a `CompoundOrdering` object. > Upgrade guava version in spark dependency lists due to CVE issue > - > > Key: SPARK-25762 > URL: https://issues.apache.org/jira/browse/SPARK-25762 > Project: Spark > Issue Type: Dependency upgrade > Components: Spark Core >Affects Versions: 2.2.1, 2.2.2, 2.3.1, 2.3.2 >Reporter: Debojyoti >Priority: Major > > In spark2.x dependency list we have guava-14.0.1.jar. However there are lot > vulnerabilities exists in this version.eg. CVE-2018-10237 > [https://www.cvedetails.com/cve/CVE-2018-10237/] > Do we have any solution to resolve it or is there any plan to upgrade guava > version any of the spark's future release? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26077) Reserved SQL words are not escaped by JDBC writer for table name
[ https://issues.apache.org/jira/browse/SPARK-26077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26077: Assignee: Apache Spark > Reserved SQL words are not escaped by JDBC writer for table name > > > Key: SPARK-26077 > URL: https://issues.apache.org/jira/browse/SPARK-26077 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Eugene Golovan >Assignee: Apache Spark >Priority: Major > > This bug is similar to SPARK-16387 but this time table name is not escaped. > How to reproduce: > 1/ Start spark shell with mysql connector > spark-shell --jars ./mysql-connector-java-8.0.13.jar > > 2/ Execute next code > > import spark.implicits._ > (spark > .createDataset(Seq("a","b","c")) > .toDF("order") > .write > .format("jdbc") > .option("url", s"jdbc:mysql://root@localhost:3306/test") > .option("driver", "com.mysql.cj.jdbc.Driver") > .option("dbtable", "condition") > .save) > > , where condition - is reserved word. > > Error message: > > java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check > the manual that corresponds to your MySQL server version for the right syntax > to use near 'condition (`order` TEXT )' at line 1 > at > com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120) > at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97) > at > com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) > at > com.mysql.cj.jdbc.StatementImpl.executeUpdateInternal(StatementImpl.java:1355) > at > com.mysql.cj.jdbc.StatementImpl.executeLargeUpdate(StatementImpl.java:2128) > at com.mysql.cj.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1264) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:844) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267) > ... 59 elided > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26077) Reserved SQL words are not escaped by JDBC writer for table name
[ https://issues.apache.org/jira/browse/SPARK-26077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693046#comment-16693046 ] Eugene Golovan commented on SPARK-26077: [~maropu] Sure. Have a look please. When running unit tests I noticed one thing. dbtable option may be a subquery as well. I added workaround for this as well but I do not like it too much. In any case if you have suggestions, you are welcome! > Reserved SQL words are not escaped by JDBC writer for table name > > > Key: SPARK-26077 > URL: https://issues.apache.org/jira/browse/SPARK-26077 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Eugene Golovan >Priority: Major > > This bug is similar to SPARK-16387 but this time table name is not escaped. > How to reproduce: > 1/ Start spark shell with mysql connector > spark-shell --jars ./mysql-connector-java-8.0.13.jar > > 2/ Execute next code > > import spark.implicits._ > (spark > .createDataset(Seq("a","b","c")) > .toDF("order") > .write > .format("jdbc") > .option("url", s"jdbc:mysql://root@localhost:3306/test") > .option("driver", "com.mysql.cj.jdbc.Driver") > .option("dbtable", "condition") > .save) > > , where condition - is reserved word. > > Error message: > > java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check > the manual that corresponds to your MySQL server version for the right syntax > to use near 'condition (`order` TEXT )' at line 1 > at > com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120) > at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97) > at > com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) > at > com.mysql.cj.jdbc.StatementImpl.executeUpdateInternal(StatementImpl.java:1355) > at > com.mysql.cj.jdbc.StatementImpl.executeLargeUpdate(StatementImpl.java:2128) > at com.mysql.cj.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1264) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:844) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267) > ... 59 elided > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26077) Reserved SQL words are not escaped by JDBC writer for table name
[ https://issues.apache.org/jira/browse/SPARK-26077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26077: Assignee: (was: Apache Spark) > Reserved SQL words are not escaped by JDBC writer for table name > > > Key: SPARK-26077 > URL: https://issues.apache.org/jira/browse/SPARK-26077 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Eugene Golovan >Priority: Major > > This bug is similar to SPARK-16387 but this time table name is not escaped. > How to reproduce: > 1/ Start spark shell with mysql connector > spark-shell --jars ./mysql-connector-java-8.0.13.jar > > 2/ Execute next code > > import spark.implicits._ > (spark > .createDataset(Seq("a","b","c")) > .toDF("order") > .write > .format("jdbc") > .option("url", s"jdbc:mysql://root@localhost:3306/test") > .option("driver", "com.mysql.cj.jdbc.Driver") > .option("dbtable", "condition") > .save) > > , where condition - is reserved word. > > Error message: > > java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check > the manual that corresponds to your MySQL server version for the right syntax > to use near 'condition (`order` TEXT )' at line 1 > at > com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120) > at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97) > at > com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) > at > com.mysql.cj.jdbc.StatementImpl.executeUpdateInternal(StatementImpl.java:1355) > at > com.mysql.cj.jdbc.StatementImpl.executeLargeUpdate(StatementImpl.java:2128) > at com.mysql.cj.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1264) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:844) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267) > ... 59 elided > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26077) Reserved SQL words are not escaped by JDBC writer for table name
[ https://issues.apache.org/jira/browse/SPARK-26077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693037#comment-16693037 ] Apache Spark commented on SPARK-26077: -- User 'golovan' has created a pull request for this issue: https://github.com/apache/spark/pull/23094 > Reserved SQL words are not escaped by JDBC writer for table name > > > Key: SPARK-26077 > URL: https://issues.apache.org/jira/browse/SPARK-26077 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Eugene Golovan >Priority: Major > > This bug is similar to SPARK-16387 but this time table name is not escaped. > How to reproduce: > 1/ Start spark shell with mysql connector > spark-shell --jars ./mysql-connector-java-8.0.13.jar > > 2/ Execute next code > > import spark.implicits._ > (spark > .createDataset(Seq("a","b","c")) > .toDF("order") > .write > .format("jdbc") > .option("url", s"jdbc:mysql://root@localhost:3306/test") > .option("driver", "com.mysql.cj.jdbc.Driver") > .option("dbtable", "condition") > .save) > > , where condition - is reserved word. > > Error message: > > java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check > the manual that corresponds to your MySQL server version for the right syntax > to use near 'condition (`order` TEXT )' at line 1 > at > com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120) > at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97) > at > com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) > at > com.mysql.cj.jdbc.StatementImpl.executeUpdateInternal(StatementImpl.java:1355) > at > com.mysql.cj.jdbc.StatementImpl.executeLargeUpdate(StatementImpl.java:2128) > at com.mysql.cj.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1264) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:844) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267) > ... 59 elided > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26127) Remove deprecated setImpurity from tree regression and classification models
[ https://issues.apache.org/jira/browse/SPARK-26127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692951#comment-16692951 ] Apache Spark commented on SPARK-26127: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/23093 > Remove deprecated setImpurity from tree regression and classification models > > > Key: SPARK-26127 > URL: https://issues.apache.org/jira/browse/SPARK-26127 > Project: Spark > Issue Type: Task > Components: ML >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Priority: Trivial > > The method {{setImpurity}} introduced in {{TreeRegressorParams}} and > {{TreeClassifierParams}} is deprecated since 2.1 and it is targeted to be > removed in 3.0. So the JIRA tracks its removal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26127) Remove deprecated setImpurity from GBTClassificationModel, DecisionTreeRegressionModel, GBTRegressionModel, RandomForestRegressionModel
Marco Gaido created SPARK-26127: --- Summary: Remove deprecated setImpurity from GBTClassificationModel, DecisionTreeRegressionModel, GBTRegressionModel, RandomForestRegressionModel Key: SPARK-26127 URL: https://issues.apache.org/jira/browse/SPARK-26127 Project: Spark Issue Type: Task Components: ML Affects Versions: 3.0.0 Reporter: Marco Gaido The method {{setImpurity}} introduced in {{TreeRegressorParams}} and {{TreeClassifierParams}} is deprecated since 2.1 and it is targeted to be removed in 3.0. So the JIRA tracks its removal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26127) Remove deprecated setImpurity from tree regression and classification models
[ https://issues.apache.org/jira/browse/SPARK-26127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido updated SPARK-26127: Summary: Remove deprecated setImpurity from tree regression and classification models (was: Remove deprecated setImpurity from GBTClassificationModel, DecisionTreeRegressionModel, GBTRegressionModel, RandomForestRegressionModel) > Remove deprecated setImpurity from tree regression and classification models > > > Key: SPARK-26127 > URL: https://issues.apache.org/jira/browse/SPARK-26127 > Project: Spark > Issue Type: Task > Components: ML >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Priority: Trivial > > The method {{setImpurity}} introduced in {{TreeRegressorParams}} and > {{TreeClassifierParams}} is deprecated since 2.1 and it is targeted to be > removed in 3.0. So the JIRA tracks its removal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26127) Remove deprecated setImpurity from tree regression and classification models
[ https://issues.apache.org/jira/browse/SPARK-26127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26127: Assignee: Apache Spark > Remove deprecated setImpurity from tree regression and classification models > > > Key: SPARK-26127 > URL: https://issues.apache.org/jira/browse/SPARK-26127 > Project: Spark > Issue Type: Task > Components: ML >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Assignee: Apache Spark >Priority: Trivial > > The method {{setImpurity}} introduced in {{TreeRegressorParams}} and > {{TreeClassifierParams}} is deprecated since 2.1 and it is targeted to be > removed in 3.0. So the JIRA tracks its removal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26127) Remove deprecated setImpurity from tree regression and classification models
[ https://issues.apache.org/jira/browse/SPARK-26127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26127: Assignee: (was: Apache Spark) > Remove deprecated setImpurity from tree regression and classification models > > > Key: SPARK-26127 > URL: https://issues.apache.org/jira/browse/SPARK-26127 > Project: Spark > Issue Type: Task > Components: ML >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Priority: Trivial > > The method {{setImpurity}} introduced in {{TreeRegressorParams}} and > {{TreeClassifierParams}} is deprecated since 2.1 and it is targeted to be > removed in 3.0. So the JIRA tracks its removal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26127) Remove deprecated setImpurity from tree regression and classification models
[ https://issues.apache.org/jira/browse/SPARK-26127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692950#comment-16692950 ] Apache Spark commented on SPARK-26127: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/23093 > Remove deprecated setImpurity from tree regression and classification models > > > Key: SPARK-26127 > URL: https://issues.apache.org/jira/browse/SPARK-26127 > Project: Spark > Issue Type: Task > Components: ML >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Priority: Trivial > > The method {{setImpurity}} introduced in {{TreeRegressorParams}} and > {{TreeClassifierParams}} is deprecated since 2.1 and it is targeted to be > removed in 3.0. So the JIRA tracks its removal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26110) If you restart the spark history server, the "Last Update" of incomplete app(had been kill) will be updated to current time
[ https://issues.apache.org/jira/browse/SPARK-26110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692868#comment-16692868 ] shahid commented on SPARK-26110: Thanks. I am analyzing the issue > If you restart the spark history server, the "Last Update" of incomplete > app(had been kill) will be updated to current time > --- > > Key: SPARK-26110 > URL: https://issues.apache.org/jira/browse/SPARK-26110 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: zhouyongjin >Priority: Major > Attachments: 2018-11-19_092114.png, 2018-11-19_092301.png, > 2018-11-19_093402.png > > > !2018-11-19_093402.png!!2018-11-19_092114.png! The Spark application that is > manually killed will remain in an incomplete state.eg 0051 and 0050. > !image-2018-11-19-09-34-25-044.png! > In this case,If you restart the spark history server, the "Last Update" of > incomplete app(had been kill) will be updated to current time. > !image-2018-11-19-09-35-06-076.png! > 0051 and 0050 has been killed on 2018-11-15. > restart spark history, "Last Updated" be modified to current time. > !image-2018-11-19-09-35-13-508.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25959) Difference in featureImportances results on computed vs saved models
[ https://issues.apache.org/jira/browse/SPARK-25959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692861#comment-16692861 ] Marco Gaido commented on SPARK-25959: - [~srowen] what do you think about backporting this? Maybe 2.2 is a bit too old, I don't know if we are planning any new 2.2 release, but 2.4 - 2.3 branches may be ok. What do you think? > Difference in featureImportances results on computed vs saved models > > > Key: SPARK-25959 > URL: https://issues.apache.org/jira/browse/SPARK-25959 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 2.2.0 >Reporter: Suraj Nayak >Assignee: Marco Gaido >Priority: Major > Fix For: 3.0.0 > > > I tried to implement GBT and found that the feature Importance computed while > the model was fit is different when the same model was saved into a storage > and loaded back. > > I also found that once the persistent model is loaded and saved back again > and loaded, the feature importance remains the same. > > Not sure if its bug while storing and reading the model first time or am > missing some parameter that need to be set before saving the model (thus > model is picking some defaults - causing feature importance to change) > > *Below is the test code:* > val testDF = Seq( > (1, 3, 2, 1, 1), > (3, 2, 1, 2, 0), > (2, 2, 1, 1, 0), > (3, 4, 2, 2, 0), > (2, 2, 1, 3, 1) > ).toDF("a", "b", "c", "d", "e") > val featureColumns = testDF.columns.filter(_ != "e") > // Assemble the features into a vector > val assembler = new > VectorAssembler().setInputCols(featureColumns).setOutputCol("features") > // Transform the data to get the feature data set > val featureDF = assembler.transform(testDF) > // Train a GBT model. > val gbt = new GBTClassifier() > .setLabelCol("e") > .setFeaturesCol("features") > .setMaxDepth(2) > .setMaxBins(5) > .setMaxIter(10) > .setSeed(10) > .fit(featureDF) > gbt.transform(featureDF).show(false) > // Write out the model > featureColumns.zip(gbt.featureImportances.toArray).sortBy(-_._2).take(20).foreach(println) > /* Prints > (d,0.5931875075767403) > (a,0.3747184548362353) > (b,0.03209403758702444) > (c,0.0) > */ > gbt.write.overwrite().save("file:///tmp/test123") > println("Reading model again") > val gbtload = GBTClassificationModel.load("file:///tmp/test123") > featureColumns.zip(gbtload.featureImportances.toArray).sortBy(-_._2).take(20).foreach(println) > /* > Prints > (d,0.6455841215290767) > (a,0.3316126797964181) > (b,0.022803198674505094) > (c,0.0) > */ > gbtload.write.overwrite().save("file:///tmp/test123_rewrite") > val gbtload2 = GBTClassificationModel.load("file:///tmp/test123_rewrite") > featureColumns.zip(gbtload2.featureImportances.toArray).sortBy(-_._2).take(20).foreach(println) > /* prints > (d,0.6455841215290767) > (a,0.3316126797964181) > (b,0.022803198674505094) > (c,0.0) > */ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org