[jira] [Updated] (SPARK-24948) SHS filters wrongly some applications due to permission check
[ https://issues.apache.org/jira/browse/SPARK-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-24948: Fix Version/s: 2.2.3 > SHS filters wrongly some applications due to permission check > - > > Key: SPARK-24948 > URL: https://issues.apache.org/jira/browse/SPARK-24948 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Blocker > Fix For: 2.2.3, 2.3.2, 2.4.0 > > > SHS filters the event logs it doesn't have permissions to read. > Unfortunately, this check is quite naive, as it takes into account only the > base permissions (ie. user, group, other permissions). For instance, if ACL > are enabled, they are ignored in this check; moreover, each filesystem may > have different policies (eg. they can consider spark as a superuser who can > access everything). > This results in some applications not being displayed in the SHS, despite the > Spark user (or whatever user the SHS is started with) can actually read their > ent logs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24948) SHS filters wrongly some applications due to permission check
[ https://issues.apache.org/jira/browse/SPARK-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned SPARK-24948: --- Assignee: Marco Gaido > SHS filters wrongly some applications due to permission check > - > > Key: SPARK-24948 > URL: https://issues.apache.org/jira/browse/SPARK-24948 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Blocker > Fix For: 2.3.2, 2.4.0 > > > SHS filters the event logs it doesn't have permissions to read. > Unfortunately, this check is quite naive, as it takes into account only the > base permissions (ie. user, group, other permissions). For instance, if ACL > are enabled, they are ignored in this check; moreover, each filesystem may > have different policies (eg. they can consider spark as a superuser who can > access everything). > This results in some applications not being displayed in the SHS, despite the > Spark user (or whatever user the SHS is started with) can actually read their > ent logs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22634) Update Bouncy castle dependency
[ https://issues.apache.org/jira/browse/SPARK-22634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572675#comment-16572675 ] Steve Loughran commented on SPARK-22634: If nothing else is using it, correct. And nothing is using any of the bouncy castle APIs directly. But: you need to be sure that nothing else is using it through the javax.crypto APIs, especially the stuff in org.apache.spark.network.crypto, or worse: some library which uses those APIs. The NOTICE files certainly hint that it's being used somehow bq. This product optionally depends on 'Bouncy Castle Crypto APIs' to generate a temporary self-signed X.509 certificate when the JVM does not provide the equivalent functionality. There's not enough history in the git logs to line that up with any code that pops up with a quick scan. Safest to update to the later version, while cutting the jets3t dependency (which is provably not used, it being incompatible with the shipping bc lib). Most due diligence: cut out bouncy castle and see what breaks... > Update Bouncy castle dependency > --- > > Key: SPARK-22634 > URL: https://issues.apache.org/jira/browse/SPARK-22634 > Project: Spark > Issue Type: Task > Components: Spark Core, SQL, Structured Streaming >Affects Versions: 2.2.0 >Reporter: Lior Regev >Assignee: Sean Owen >Priority: Minor > Fix For: 2.3.0 > > > Spark's usage of jets3t library as well as Spark's own Flume and Kafka > streaming uses bouncy castle version 1.51 > This is an outdated version as the latest one is 1.58 > This, in turn renders packages such as > [spark-hadoopcryptoledger-ds|https://github.com/ZuInnoTe/spark-hadoopcryptoledger-ds] > unusable since these require 1.58 and spark's distributions come along with > 1.51 > My own attempt was to run on EMR, and since I automatically get all of > spark's dependecies (bouncy castle 1.51 being one of them) into the > classpath, using the library to parse blockchain data failed due to missing > functionality. > I have also opened an > [issue|https://bitbucket.org/jmurty/jets3t/issues/242/bouncycastle-dependency] > with jets3t to update their dependecy as well, but along with that Spark > would have to update it's own or at least be packaged with a newer version -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25054) Enable MetricsServlet sink for Executor
[ https://issues.apache.org/jira/browse/SPARK-25054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25054: Assignee: Apache Spark > Enable MetricsServlet sink for Executor > --- > > Key: SPARK-25054 > URL: https://issues.apache.org/jira/browse/SPARK-25054 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Lantao Jin >Assignee: Apache Spark >Priority: Minor > > The MetricsServlet sink is added by default as a sink in the master. But > there is no way to query the Executor metrics via Servlet. This ticket offers > a way to enable the MetricsServlet Sink in Executor side when > spark.executor.ui.enabled is set to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25054) Enable MetricsServlet sink for Executor
[ https://issues.apache.org/jira/browse/SPARK-25054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572662#comment-16572662 ] Apache Spark commented on SPARK-25054: -- User 'LantaoJin' has created a pull request for this issue: https://github.com/apache/spark/pull/22034 > Enable MetricsServlet sink for Executor > --- > > Key: SPARK-25054 > URL: https://issues.apache.org/jira/browse/SPARK-25054 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Lantao Jin >Priority: Minor > > The MetricsServlet sink is added by default as a sink in the master. But > there is no way to query the Executor metrics via Servlet. This ticket offers > a way to enable the MetricsServlet Sink in Executor side when > spark.executor.ui.enabled is set to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25054) Enable MetricsServlet sink for Executor
[ https://issues.apache.org/jira/browse/SPARK-25054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25054: Assignee: (was: Apache Spark) > Enable MetricsServlet sink for Executor > --- > > Key: SPARK-25054 > URL: https://issues.apache.org/jira/browse/SPARK-25054 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Lantao Jin >Priority: Minor > > The MetricsServlet sink is added by default as a sink in the master. But > there is no way to query the Executor metrics via Servlet. This ticket offers > a way to enable the MetricsServlet Sink in Executor side when > spark.executor.ui.enabled is set to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25054) Enable MetricsServlet sink for Executor
Lantao Jin created SPARK-25054: -- Summary: Enable MetricsServlet sink for Executor Key: SPARK-25054 URL: https://issues.apache.org/jira/browse/SPARK-25054 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.3.1 Reporter: Lantao Jin The MetricsServlet sink is added by default as a sink in the master. But there is no way to query the Executor metrics via Servlet. This ticket offers a way to enable the MetricsServlet Sink in Executor side when spark.executor.ui.enabled is set to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25052) Is there any possibility that spark structured streaming generate duplicates in the output?
[ https://issues.apache.org/jira/browse/SPARK-25052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572608#comment-16572608 ] bharath kumar avusherla commented on SPARK-25052: - i also thought about it. Hence I created it as question. Anyhow i will send the question to the mailing list. > Is there any possibility that spark structured streaming generate duplicates > in the output? > --- > > Key: SPARK-25052 > URL: https://issues.apache.org/jira/browse/SPARK-25052 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: bharath kumar avusherla >Priority: Minor > > We recently observed that the spark structured streaming generated duplicates > in the output when reading from Kafka topic and storing the output to the S3 > (and checkpointing in S3). We ran into this issue twice. This is not > reproducible. Is there anyone has ever faced this kind of issue before? Is > this because of S3 eventual consistency? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24948) SHS filters wrongly some applications due to permission check
[ https://issues.apache.org/jira/browse/SPARK-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-24948: Fix Version/s: 2.3.2 > SHS filters wrongly some applications due to permission check > - > > Key: SPARK-24948 > URL: https://issues.apache.org/jira/browse/SPARK-24948 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: Marco Gaido >Priority: Blocker > Fix For: 2.3.2, 2.4.0 > > > SHS filters the event logs it doesn't have permissions to read. > Unfortunately, this check is quite naive, as it takes into account only the > base permissions (ie. user, group, other permissions). For instance, if ACL > are enabled, they are ignored in this check; moreover, each filesystem may > have different policies (eg. they can consider spark as a superuser who can > access everything). > This results in some applications not being displayed in the SHS, despite the > Spark user (or whatever user the SHS is started with) can actually read their > ent logs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25052) Is there any possibility that spark structured streaming generate duplicates in the output?
[ https://issues.apache.org/jira/browse/SPARK-25052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-25052. -- Resolution: Invalid Questions should better go to mailing list, https://spark.apache.org/community.html. Let's better file an issue when it's clear if this is an issue. > Is there any possibility that spark structured streaming generate duplicates > in the output? > --- > > Key: SPARK-25052 > URL: https://issues.apache.org/jira/browse/SPARK-25052 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: bharath kumar avusherla >Priority: Minor > > We recently observed that the spark structured streaming generated duplicates > in the output when reading from Kafka topic and storing the output to the S3 > (and checkpointing in S3). We ran into this issue twice. This is not > reproducible. Is there anyone has ever faced this kind of issue before? Is > this because of S3 eventual consistency? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25051) where clause on dataset gives AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572571#comment-16572571 ] Hyukjin Kwon commented on SPARK-25051: -- Can you post some codes for df1 and df2 as well? > where clause on dataset gives AnalysisException > --- > > Key: SPARK-25051 > URL: https://issues.apache.org/jira/browse/SPARK-25051 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.3.0 >Reporter: MIK >Priority: Major > > *schemas :* > df1 > => id ts > df2 > => id name country > *code:* > val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull) > *error*: > org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing > from xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in > operator !Filter isnull(id#0). Attribute(s) with the same name appear in the > operation: id. Please check if the right attribute(s) are used.;; > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47) > at org.apache.spark.sql.Dataset.(Dataset.scala:172) > at org.apache.spark.sql.Dataset.(Dataset.scala:178) > at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65) > at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300) > at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458) > at org.apache.spark.sql.Dataset.where(Dataset.scala:1486) > This works fine in spark 2.2.2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25029) Scala 2.12 issues: TaskNotSerializable and Janino "Two non-abstract methods ..." errors
[ https://issues.apache.org/jira/browse/SPARK-25029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572570#comment-16572570 ] Sean Owen commented on SPARK-25029: --- If we _really_ needed to resolve this unilaterally from the Spark side, I think we could get away forking one class from janino and patching it lightly per my pull request. Forking isn't great, especially when it's not clear whether future official releases will have something similar. But it's feasible here as I believe the patch works at least w.r.t. Spark. > Scala 2.12 issues: TaskNotSerializable and Janino "Two non-abstract methods > ..." errors > --- > > Key: SPARK-25029 > URL: https://issues.apache.org/jira/browse/SPARK-25029 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Blocker > > We actually still have some test failures in the Scala 2.12 build. There seem > to be two types. First are that some tests fail with "TaskNotSerializable" > because some code construct now captures a reference to scalatest's > AssertionHelper. Example: > {code:java} > - LegacyAccumulatorWrapper with AccumulatorParam that has no equals/hashCode > *** FAILED *** java.io.NotSerializableException: > org.scalatest.Assertions$AssertionsHelper Serialization stack: - object not > serializable (class: org.scalatest.Assertions$AssertionsHelper, value: > org.scalatest.Assertions$AssertionsHelper@3bc5fc8f){code} > These seem generally easy to fix by tweaking the test code. It's not clear if > something about closure cleaning in 2.12 could be improved to detect this > situation automatically; given that yet only a handful of tests fail for this > reason, it's unlikely to be a systemic problem. > > The other error is curioser. Janino fails to compile generate code in many > cases with errors like: > {code:java} > - encode/decode for seq of string: List(abc, xyz) *** FAILED *** > java.lang.RuntimeException: Error while encoding: > org.codehaus.janino.InternalCompilerException: failed to compile: > org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass": > Two non-abstract methods "public int scala.collection.TraversableOnce.size()" > have the same parameter types, declaring type and return type{code} > > I include the full generated code that failed in one case below. There is no > {{size()}} in the generated code. It's got to be down to some difference in > Scala 2.12, potentially even a Janino problem. > > {code:java} > Caused by: org.codehaus.janino.InternalCompilerException: Compiling > "GeneratedClass": Two non-abstract methods "public int > scala.collection.TraversableOnce.size()" have the same parameter types, > declaring type and return type > at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:361) > at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:234) > at > org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:446) > at > org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:313) > at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:235) > at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:204) > at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1342) > ... 30 more > Caused by: org.codehaus.janino.InternalCompilerException: Two non-abstract > methods "public int scala.collection.TraversableOnce.size()" have the same > parameter types, declaring type and return type > at > org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:9112) > at > org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:) > at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:8770) > at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:8672) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4737) > at org.codehaus.janino.UnitCompiler.access$8300(UnitCompiler.java:212) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:4097) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:4070) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4902) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:4070) > at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:5253) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4391) > at org.codehaus.janino.UnitCompiler.access$8000(UnitCompiler.java:212) > at >
[jira] [Commented] (SPARK-24924) Add mapping for built-in Avro data source
[ https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572564#comment-16572564 ] Hyukjin Kwon commented on SPARK-24924: -- [~cloud_fan], Yea, adding them as implicit sounds not a good idea. But I think we can still add {{spark.read.avro}} in {{DataFrameReader}} although it looks a bit weird since Avro is external package. > Add mapping for built-in Avro data source > - > > Key: SPARK-24924 > URL: https://issues.apache.org/jira/browse/SPARK-24924 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 2.4.0 > > > This issue aims to the followings. > # Like `com.databricks.spark.csv` mapping, we had better map > `com.databricks.spark.avro` to built-in Avro data source. > # Remove incorrect error message, `Please find an Avro package at ...`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24251) DataSourceV2: Add AppendData logical operation
[ https://issues.apache.org/jira/browse/SPARK-24251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-24251. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21305 [https://github.com/apache/spark/pull/21305] > DataSourceV2: Add AppendData logical operation > -- > > Key: SPARK-24251 > URL: https://issues.apache.org/jira/browse/SPARK-24251 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Fix For: 2.4.0 > > > The SPIP to standardize SQL logical plans (SPARK-23521) proposes AppendData > for inserting data in append mode. This is the simplest plan to implement > first. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24251) DataSourceV2: Add AppendData logical operation
[ https://issues.apache.org/jira/browse/SPARK-24251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-24251: --- Assignee: Ryan Blue > DataSourceV2: Add AppendData logical operation > -- > > Key: SPARK-24251 > URL: https://issues.apache.org/jira/browse/SPARK-24251 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Fix For: 2.4.0 > > > The SPIP to standardize SQL logical plans (SPARK-23521) proposes AppendData > for inserting data in append mode. This is the simplest plan to implement > first. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24924) Add mapping for built-in Avro data source
[ https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572556#comment-16572556 ] Wenchen Fan commented on SPARK-24924: - > I assume we could theoretically also support the spark.read.avro format as > well There was a discussion about why we shouldn't support it: https://github.com/apache/spark/pull/21841 Users always need to do some manual work to use `spark.read.avro`, even with the databricks avro package. Now users can still define an implicit class to support `spark.read.avro` if they want to. > Add mapping for built-in Avro data source > - > > Key: SPARK-24924 > URL: https://issues.apache.org/jira/browse/SPARK-24924 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 2.4.0 > > > This issue aims to the followings. > # Like `com.databricks.spark.csv` mapping, we had better map > `com.databricks.spark.avro` to built-in Avro data source. > # Remove incorrect error message, `Please find an Avro package at ...`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22634) Update Bouncy castle dependency
[ https://issues.apache.org/jira/browse/SPARK-22634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572551#comment-16572551 ] Sean Owen commented on SPARK-22634: --- [~ste...@apache.org] are you saying that this whole issue is moot if SPARK-23654 is resolved? that might be the better resolution. If that's correct, then maybe bouncy castle isn't really used here? > Update Bouncy castle dependency > --- > > Key: SPARK-22634 > URL: https://issues.apache.org/jira/browse/SPARK-22634 > Project: Spark > Issue Type: Task > Components: Spark Core, SQL, Structured Streaming >Affects Versions: 2.2.0 >Reporter: Lior Regev >Assignee: Sean Owen >Priority: Minor > Fix For: 2.3.0 > > > Spark's usage of jets3t library as well as Spark's own Flume and Kafka > streaming uses bouncy castle version 1.51 > This is an outdated version as the latest one is 1.58 > This, in turn renders packages such as > [spark-hadoopcryptoledger-ds|https://github.com/ZuInnoTe/spark-hadoopcryptoledger-ds] > unusable since these require 1.58 and spark's distributions come along with > 1.51 > My own attempt was to run on EMR, and since I automatically get all of > spark's dependecies (bouncy castle 1.51 being one of them) into the > classpath, using the library to parse blockchain data failed due to missing > functionality. > I have also opened an > [issue|https://bitbucket.org/jmurty/jets3t/issues/242/bouncycastle-dependency] > with jets3t to update their dependecy as well, but along with that Spark > would have to update it's own or at least be packaged with a newer version -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22634) Update Bouncy castle dependency
[ https://issues.apache.org/jira/browse/SPARK-22634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572542#comment-16572542 ] Saisai Shao commented on SPARK-22634: - [~srowen] I'm wondering if it is possible to upgrade to version 1.6.0, as this version fixed to CVEs (https://www.bouncycastle.org/latest_releases.html). > Update Bouncy castle dependency > --- > > Key: SPARK-22634 > URL: https://issues.apache.org/jira/browse/SPARK-22634 > Project: Spark > Issue Type: Task > Components: Spark Core, SQL, Structured Streaming >Affects Versions: 2.2.0 >Reporter: Lior Regev >Assignee: Sean Owen >Priority: Minor > Fix For: 2.3.0 > > > Spark's usage of jets3t library as well as Spark's own Flume and Kafka > streaming uses bouncy castle version 1.51 > This is an outdated version as the latest one is 1.58 > This, in turn renders packages such as > [spark-hadoopcryptoledger-ds|https://github.com/ZuInnoTe/spark-hadoopcryptoledger-ds] > unusable since these require 1.58 and spark's distributions come along with > 1.51 > My own attempt was to run on EMR, and since I automatically get all of > spark's dependecies (bouncy castle 1.51 being one of them) into the > classpath, using the library to parse blockchain data failed due to missing > functionality. > I have also opened an > [issue|https://bitbucket.org/jmurty/jets3t/issues/242/bouncycastle-dependency] > with jets3t to update their dependecy as well, but along with that Spark > would have to update it's own or at least be packaged with a newer version -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23935) High-order function: map_entries(map) → array>
[ https://issues.apache.org/jira/browse/SPARK-23935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572532#comment-16572532 ] Apache Spark commented on SPARK-23935: -- User 'kiszk' has created a pull request for this issue: https://github.com/apache/spark/pull/22033 > High-order function: map_entries(map) → array> > - > > Key: SPARK-23935 > URL: https://issues.apache.org/jira/browse/SPARK-23935 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Marek Novotny >Priority: Major > Fix For: 2.4.0 > > > Ref: https://prestodb.io/docs/current/functions/map.html > Returns an array of all entries in the given map. > {noformat} > SELECT map_entries(MAP(ARRAY[1, 2], ARRAY['x', 'y'])); -- [ROW(1, 'x'), > ROW(2, 'y')] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel
[ https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25047: Assignee: Apache Spark > Can't assign SerializedLambda to scala.Function1 in deserialization of > BucketedRandomProjectionLSHModel > --- > > Key: SPARK-25047 > URL: https://issues.apache.org/jira/browse/SPARK-25047 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.4.0 >Reporter: Sean Owen >Assignee: Apache Spark >Priority: Major > > Another distinct test failure: > {code:java} > - BucketedRandomProjectionLSH: streaming transform *** FAILED *** > org.apache.spark.sql.streaming.StreamingQueryException: Query [id = > 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = > 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted > due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent > failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): > java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > ... > Cause: java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233) > at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284) > ...{code} > Here the different nature of a Java 8 LMF closure trips of Java > serialization/deserialization. I think this can be patched by manually > implementing the Java serialization here, and don't see other instances (yet). > Also wondering if this "val" can be a "def". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel
[ https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25047: Assignee: (was: Apache Spark) > Can't assign SerializedLambda to scala.Function1 in deserialization of > BucketedRandomProjectionLSHModel > --- > > Key: SPARK-25047 > URL: https://issues.apache.org/jira/browse/SPARK-25047 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Major > > Another distinct test failure: > {code:java} > - BucketedRandomProjectionLSH: streaming transform *** FAILED *** > org.apache.spark.sql.streaming.StreamingQueryException: Query [id = > 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = > 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted > due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent > failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): > java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > ... > Cause: java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233) > at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284) > ...{code} > Here the different nature of a Java 8 LMF closure trips of Java > serialization/deserialization. I think this can be patched by manually > implementing the Java serialization here, and don't see other instances (yet). > Also wondering if this "val" can be a "def". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel
[ https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572518#comment-16572518 ] Apache Spark commented on SPARK-25047: -- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/22032 > Can't assign SerializedLambda to scala.Function1 in deserialization of > BucketedRandomProjectionLSHModel > --- > > Key: SPARK-25047 > URL: https://issues.apache.org/jira/browse/SPARK-25047 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Major > > Another distinct test failure: > {code:java} > - BucketedRandomProjectionLSH: streaming transform *** FAILED *** > org.apache.spark.sql.streaming.StreamingQueryException: Query [id = > 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = > 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted > due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent > failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): > java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > ... > Cause: java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233) > at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284) > ...{code} > Here the different nature of a Java 8 LMF closure trips of Java > serialization/deserialization. I think this can be patched by manually > implementing the Java serialization here, and don't see other instances (yet). > Also wondering if this "val" can be a "def". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25045) Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`
[ https://issues.apache.org/jira/browse/SPARK-25045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-25045. --- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22026 [https://github.com/apache/spark/pull/22026] > Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions` > --- > > Key: SPARK-25045 > URL: https://issues.apache.org/jira/browse/SPARK-25045 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Jiang Xingbo >Assignee: Jiang Xingbo >Priority: Major > Fix For: 2.4.0 > > > Signature of the function passed to `RDDBarrier.mapPartitions()` is different > from that of `RDD.mapPartitions`. The latter doesn’t take a TaskContext. We > shall make the function signature the same to avoid confusion and misusage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25045) Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`
[ https://issues.apache.org/jira/browse/SPARK-25045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng reassigned SPARK-25045: - Assignee: Jiang Xingbo > Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions` > --- > > Key: SPARK-25045 > URL: https://issues.apache.org/jira/browse/SPARK-25045 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Jiang Xingbo >Assignee: Jiang Xingbo >Priority: Major > Fix For: 2.4.0 > > > Signature of the function passed to `RDDBarrier.mapPartitions()` is different > from that of `RDD.mapPartitions`. The latter doesn’t take a TaskContext. We > shall make the function signature the same to avoid confusion and misusage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25053) Allow additional port forwarding on Spark on K8S as needed
holdenk created SPARK-25053: --- Summary: Allow additional port forwarding on Spark on K8S as needed Key: SPARK-25053 URL: https://issues.apache.org/jira/browse/SPARK-25053 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.4.0 Reporter: holdenk In some cases, like setting up remote debuggers, adding additional ports to be forwarded would be useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25052) Is there any possibility that spark structured streaming generate duplicates in the output?
bharath kumar avusherla created SPARK-25052: --- Summary: Is there any possibility that spark structured streaming generate duplicates in the output? Key: SPARK-25052 URL: https://issues.apache.org/jira/browse/SPARK-25052 Project: Spark Issue Type: Question Components: Spark Core Affects Versions: 2.3.0 Reporter: bharath kumar avusherla We recently observed that the spark structured streaming generated duplicates in the output when reading from Kafka topic and storing the output to the S3 (and checkpointing in S3). We ran into this issue twice. This is not reproducible. Is there anyone has ever faced this kind of issue before? Is this because of S3 eventual consistency? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel
[ https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572410#comment-16572410 ] Sean Owen commented on SPARK-25047: --- More notes.These two SO answers shed a little light: [https://stackoverflow.com/a/28367602/64174] [https://stackoverflow.com/questions/28079307/unable-to-deserialize-lambda/28084460#28084460] It suggests the problem is that the SerializedLambda instance that is deserialized should provide a readResolve() method to, I assume, resolve it back into a scala.Function1. And that should actually be implemented by a {{$deserializeLambda$(SerializedLambda)}} function in the capturing class. It seems like something isn't turning it back from a SerializedLambda to something else. The method is in the byte code of BucketedRandomProjectionLSH and decompiles as {code:java} private static /* synthetic */ Object $deserializeLambda$(SerializedLambda serializedLambda) { return LambdaDeserialize.bootstrap(new MethodHandle[]{$anonfun$hashDistance$1$adapted(scala.Tuple2 ), $anonfun$hashFunction$2$adapted(org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel org.apache.spark.ml.linalg.Vector org.apache.spark.ml.linalg.Vector ), $anonfun$hashFunction$3$adapted(java.lang.Object ), $anonfun$hashFunction$1(org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel org.apache.spark.ml.linalg.Vector )}, serializedLambda); }{code} While I traced through this for a while, I couldn't make sense of it. However, nothing actually failed around here. The ultimate error was a bit later, and as in the StackOverflow post above. It goes without saying that there are plenty of fields of type scala.Function1 in Spark and this is the only problem one, and I can't see why. Is it because it involves an array type? grepping suggests that could be unique. However I tried to create a repro in a simple class file and all worked as expected too. Something is odd about this case, and I don't know if it is in fact triggering some odd corner case issue in scala or Java 8, or whether the Spark code could be tweaked to dodge it. > Can't assign SerializedLambda to scala.Function1 in deserialization of > BucketedRandomProjectionLSHModel > --- > > Key: SPARK-25047 > URL: https://issues.apache.org/jira/browse/SPARK-25047 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Major > > Another distinct test failure: > {code:java} > - BucketedRandomProjectionLSH: streaming transform *** FAILED *** > org.apache.spark.sql.streaming.StreamingQueryException: Query [id = > 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = > 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted > due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent > failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): > java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > ... > Cause: java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233) > at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284) > ...{code} > Here the different nature of a Java 8 LMF closure trips of Java > serialization/deserialization. I think this can be patched by manually > implementing the Java serialization here, and don't see other instances (yet). > Also wondering if this "val" can be a "def". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25046) Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO"
[ https://issues.apache.org/jira/browse/SPARK-25046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-25046: --- Assignee: SongXun > Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" > - > > Key: SPARK-25046 > URL: https://issues.apache.org/jira/browse/SPARK-25046 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: SongXun >Assignee: SongXun >Priority: Minor > Labels: pull-request-available > Fix For: 2.4.0 > > > Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" . We should > throw > ParseException(s"Operation not allowed: $message", ctx) as Create View does. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25046) Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO"
[ https://issues.apache.org/jira/browse/SPARK-25046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-25046. - Resolution: Fixed Fix Version/s: 2.4.0 > Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" > - > > Key: SPARK-25046 > URL: https://issues.apache.org/jira/browse/SPARK-25046 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: SongXun >Assignee: SongXun >Priority: Minor > Labels: pull-request-available > Fix For: 2.4.0 > > > Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" . We should > throw > ParseException(s"Operation not allowed: $message", ctx) as Create View does. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572232#comment-16572232 ] Thomas Graves commented on SPARK-23207: --- does this affect spark 2.2 and < ? from the description it sounds like it, in which case we should backport. > Shuffle+Repartition on an DataFrame could lead to incorrect answers > --- > > Key: SPARK-23207 > URL: https://issues.apache.org/jira/browse/SPARK-23207 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jiang Xingbo >Assignee: Jiang Xingbo >Priority: Blocker > Labels: correctness > Fix For: 2.3.0 > > > Currently shuffle repartition uses RoundRobinPartitioning, the generated > result is nondeterministic since the sequence of input rows are not > determined. > The bug can be triggered when there is a repartition call following a shuffle > (which would lead to non-deterministic row ordering), as the pattern shows > below: > upstream stage -> repartition stage -> result stage > (-> indicate a shuffle) > When one of the executors process goes down, some tasks on the repartition > stage will be retried and generate inconsistent ordering, and some tasks of > the result stage will be retried generating different data. > The following code returns 931532, instead of 100: > {code} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1000 * 1000, 1).repartition(200).map { x => > x > }.repartition(200).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 2) { > throw new Exception("pkill -f java".!!) > } > x > } > res.distinct().count() > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25029) Scala 2.12 issues: TaskNotSerializable and Janino "Two non-abstract methods ..." errors
[ https://issues.apache.org/jira/browse/SPARK-25029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-25029: -- Issue Type: Sub-task (was: Bug) Parent: SPARK-14220 > Scala 2.12 issues: TaskNotSerializable and Janino "Two non-abstract methods > ..." errors > --- > > Key: SPARK-25029 > URL: https://issues.apache.org/jira/browse/SPARK-25029 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Blocker > > We actually still have some test failures in the Scala 2.12 build. There seem > to be two types. First are that some tests fail with "TaskNotSerializable" > because some code construct now captures a reference to scalatest's > AssertionHelper. Example: > {code:java} > - LegacyAccumulatorWrapper with AccumulatorParam that has no equals/hashCode > *** FAILED *** java.io.NotSerializableException: > org.scalatest.Assertions$AssertionsHelper Serialization stack: - object not > serializable (class: org.scalatest.Assertions$AssertionsHelper, value: > org.scalatest.Assertions$AssertionsHelper@3bc5fc8f){code} > These seem generally easy to fix by tweaking the test code. It's not clear if > something about closure cleaning in 2.12 could be improved to detect this > situation automatically; given that yet only a handful of tests fail for this > reason, it's unlikely to be a systemic problem. > > The other error is curioser. Janino fails to compile generate code in many > cases with errors like: > {code:java} > - encode/decode for seq of string: List(abc, xyz) *** FAILED *** > java.lang.RuntimeException: Error while encoding: > org.codehaus.janino.InternalCompilerException: failed to compile: > org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass": > Two non-abstract methods "public int scala.collection.TraversableOnce.size()" > have the same parameter types, declaring type and return type{code} > > I include the full generated code that failed in one case below. There is no > {{size()}} in the generated code. It's got to be down to some difference in > Scala 2.12, potentially even a Janino problem. > > {code:java} > Caused by: org.codehaus.janino.InternalCompilerException: Compiling > "GeneratedClass": Two non-abstract methods "public int > scala.collection.TraversableOnce.size()" have the same parameter types, > declaring type and return type > at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:361) > at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:234) > at > org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:446) > at > org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:313) > at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:235) > at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:204) > at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1342) > ... 30 more > Caused by: org.codehaus.janino.InternalCompilerException: Two non-abstract > methods "public int scala.collection.TraversableOnce.size()" have the same > parameter types, declaring type and return type > at > org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:9112) > at > org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:) > at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:8770) > at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:8672) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4737) > at org.codehaus.janino.UnitCompiler.access$8300(UnitCompiler.java:212) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:4097) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:4070) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4902) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:4070) > at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:5253) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4391) > at org.codehaus.janino.UnitCompiler.access$8000(UnitCompiler.java:212) > at > org.codehaus.janino.UnitCompiler$12.visitConditionalExpression(UnitCompiler.java:4094) > at > org.codehaus.janino.UnitCompiler$12.visitConditionalExpression(UnitCompiler.java:4070) > at org.codehaus.janino.Java$ConditionalExpression.accept(Java.java:4344) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:4070) > at
[jira] [Updated] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-25044: -- Description: A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 Fix HandleNullInputsForUDF rule": {code:java} - SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED *** Results do not match for query: ... == Results == == Results == !== Correct Answer - 3 == == Spark Answer - 3 == !struct<> struct ![0,10,null] [0,10,0] ![1,12,null] [1,12,1] ![2,14,null] [2,14,2] (QueryTest.scala:163){code} You can kind of get what's going on reading the test: {code:java} test("SPARK-24891 Fix HandleNullInputsForUDF rule") { // assume(!ClosureCleanerSuite2.supportsLMFs) // This test won't test what it intends to in 2.12, as lambda metafactory closures // have arg types that are not primitive, but Object val udf1 = udf({(x: Int, y: Int) => x + y}) val df = spark.range(0, 3).toDF("a") .withColumn("b", udf1($"a", udf1($"a", lit(10 .withColumn("c", udf1($"a", lit(null))) val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed comparePlans(df.logicalPlan, plan) checkAnswer( df, Seq( Row(0, 10, null), Row(1, 12, null), Row(2, 14, null))) }{code} It seems that the closure that is fed in as a UDF changes behavior, in a way that primitive-type arguments are handled differently. For example an Int argument, when fed 'null', acts like 0. I'm sure it's a difference in the LMF closure and how its types are understood, but not exactly sure of the cause yet. was: A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 Fix HandleNullInputsForUDF rule". (Details in a sec when I can copy-paste them.) It seems that the closure that is fed in as a UDF changes behavior, in a way that primitive-type arguments are handled differently. For example an Int argument, when fed 'null', acts like 0. I'm sure it's a difference in the LMF closure and how its types are understood, but not exactly sure of the cause yet. > Address translation of LMF closure primitive args to Object in Scala 2.12 > - > > Key: SPARK-25044 > URL: https://issues.apache.org/jira/browse/SPARK-25044 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Major > > A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 > Fix HandleNullInputsForUDF rule": > {code:java} > - SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED *** > Results do not match for query: > ... > == Results == > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !struct<> struct > ![0,10,null] [0,10,0] > ![1,12,null] [1,12,1] > ![2,14,null] [2,14,2] (QueryTest.scala:163){code} > You can kind of get what's going on reading the test: > {code:java} > test("SPARK-24891 Fix HandleNullInputsForUDF rule") { > // assume(!ClosureCleanerSuite2.supportsLMFs) > // This test won't test what it intends to in 2.12, as lambda metafactory > closures > // have arg types that are not primitive, but Object > val udf1 = udf({(x: Int, y: Int) => x + y}) > val df = spark.range(0, 3).toDF("a") > .withColumn("b", udf1($"a", udf1($"a", lit(10 > .withColumn("c", udf1($"a", lit(null))) > val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed > comparePlans(df.logicalPlan, plan) > checkAnswer( > df, > Seq( > Row(0, 10, null), > Row(1, 12, null), > Row(2, 14, null))) > }{code} > > It seems that the closure that is fed in as a UDF changes behavior, in a way > that primitive-type arguments are handled differently. For example an Int > argument, when fed 'null', acts like 0. > I'm sure it's a difference in the LMF closure and how its types are > understood, but not exactly sure of the cause yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25051) where clause on dataset gives AnalysisException
MIK created SPARK-25051: --- Summary: where clause on dataset gives AnalysisException Key: SPARK-25051 URL: https://issues.apache.org/jira/browse/SPARK-25051 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 2.3.0 Reporter: MIK *schemas :* df1 => id ts df2 => id name country *code:* val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull) *error*: org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing from xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in operator !Filter isnull(id#0). Attribute(s) with the same name appear in the operation: id. Please check if the right attribute(s) are used.;; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47) at org.apache.spark.sql.Dataset.(Dataset.scala:172) at org.apache.spark.sql.Dataset.(Dataset.scala:178) at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65) at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300) at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458) at org.apache.spark.sql.Dataset.where(Dataset.scala:1486) This works fine in spark 2.2.2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23932) High-order function: zip_with(array, array, function) → array
[ https://issues.apache.org/jira/browse/SPARK-23932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23932: Assignee: (was: Apache Spark) > High-order function: zip_with(array, array, function) → > array > --- > > Key: SPARK-23932 > URL: https://issues.apache.org/jira/browse/SPARK-23932 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/array.html > Merges the two given arrays, element-wise, into a single array using > function. Both arrays must be the same length. > {noformat} > SELECT zip_with(ARRAY[1, 3, 5], ARRAY['a', 'b', 'c'], (x, y) -> (y, x)); -- > [ROW('a', 1), ROW('b', 3), ROW('c', 5)] > SELECT zip_with(ARRAY[1, 2], ARRAY[3, 4], (x, y) -> x + y); -- [4, 6] > SELECT zip_with(ARRAY['a', 'b', 'c'], ARRAY['d', 'e', 'f'], (x, y) -> > concat(x, y)); -- ['ad', 'be', 'cf'] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23932) High-order function: zip_with(array, array, function) → array
[ https://issues.apache.org/jira/browse/SPARK-23932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23932: Assignee: Apache Spark > High-order function: zip_with(array, array, function) → > array > --- > > Key: SPARK-23932 > URL: https://issues.apache.org/jira/browse/SPARK-23932 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Apache Spark >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/array.html > Merges the two given arrays, element-wise, into a single array using > function. Both arrays must be the same length. > {noformat} > SELECT zip_with(ARRAY[1, 3, 5], ARRAY['a', 'b', 'c'], (x, y) -> (y, x)); -- > [ROW('a', 1), ROW('b', 3), ROW('c', 5)] > SELECT zip_with(ARRAY[1, 2], ARRAY[3, 4], (x, y) -> x + y); -- [4, 6] > SELECT zip_with(ARRAY['a', 'b', 'c'], ARRAY['d', 'e', 'f'], (x, y) -> > concat(x, y)); -- ['ad', 'be', 'cf'] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23932) High-order function: zip_with(array, array, function) → array
[ https://issues.apache.org/jira/browse/SPARK-23932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572140#comment-16572140 ] Apache Spark commented on SPARK-23932: -- User 'techaddict' has created a pull request for this issue: https://github.com/apache/spark/pull/22031 > High-order function: zip_with(array, array, function) → > array > --- > > Key: SPARK-23932 > URL: https://issues.apache.org/jira/browse/SPARK-23932 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/array.html > Merges the two given arrays, element-wise, into a single array using > function. Both arrays must be the same length. > {noformat} > SELECT zip_with(ARRAY[1, 3, 5], ARRAY['a', 'b', 'c'], (x, y) -> (y, x)); -- > [ROW('a', 1), ROW('b', 3), ROW('c', 5)] > SELECT zip_with(ARRAY[1, 2], ARRAY[3, 4], (x, y) -> x + y); -- [4, 6] > SELECT zip_with(ARRAY['a', 'b', 'c'], ARRAY['d', 'e', 'f'], (x, y) -> > concat(x, y)); -- ['ad', 'be', 'cf'] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572095#comment-16572095 ] Stavros Kontopoulos edited comment on SPARK-25044 at 8/7/18 6:35 PM: - [~lrytz] any insight? was (Author: skonto): @lrytz any insight? > Address translation of LMF closure primitive args to Object in Scala 2.12 > - > > Key: SPARK-25044 > URL: https://issues.apache.org/jira/browse/SPARK-25044 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Major > > A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 > Fix HandleNullInputsForUDF rule". (Details in a sec when I can copy-paste > them.) > It seems that the closure that is fed in as a UDF changes behavior, in a way > that primitive-type arguments are handled differently. For example an Int > argument, when fed 'null', acts like 0. > I'm sure it's a difference in the LMF closure and how its types are > understood, but not exactly sure of the cause yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572115#comment-16572115 ] Sean Owen commented on SPARK-25044: --- More specific info. In Scala 2.12: {code:java} scala> val f = (i: Int, j: Long) => "x" f: (Int, Long) => String = $$Lambda$1045/927369095@51ec2856 scala> val methods = f.getClass.getMethods.filter(m => m.getName == "apply" && !m.isBridge) methods: Array[java.lang.reflect.Method] = Array(public java.lang.Object $$Lambda$1045/927369095.apply(java.lang.Object,java.lang.Object)) scala> methods.head.getParameterTypes res0: Array[Class[_]] = Array(class java.lang.Object, class java.lang.Object) {code} Whereas in Scala 2.11 the result is: {code:java} ... scala> res0: Array[Class[_]] = Array(int, long){code} I guess one question for folks like [~lrytz] is, is that 'correct' as far as Scala is concerned? From reading [https://docs.oracle.com/javase/8/docs/api/java/lang/invoke/LambdaMetafactory.html] I got some sense that compilers had some latitude in how the lambda is implemented, but am just wondering if it makes sense that the {{apply}} method's signature doesn't seem to match what's expected. Here is the full list of methods that {{f}}'s class implements; that first one is the only logical candidate to look for, I think, as it's the only one returning String. {code:java} public java.lang.String $line3.$read$$iw$$iw$$$Lambda$1045/927369095.apply(java.lang.Object,java.lang.Object) public java.lang.Object $line3.$read$$iw$$iw$$$Lambda$1045/927369095.apply(java.lang.Object,java.lang.Object) public final void java.lang.Object.wait(long,int) throws java.lang.InterruptedException public final native void java.lang.Object.wait(long) throws java.lang.InterruptedException public final void java.lang.Object.wait() throws java.lang.InterruptedException public boolean java.lang.Object.equals(java.lang.Object) public java.lang.String java.lang.Object.toString() public native int java.lang.Object.hashCode() public final native java.lang.Class java.lang.Object.getClass() public final native void java.lang.Object.notify() public final native void java.lang.Object.notifyAll() public default scala.Function1 scala.Function2.curried() public default scala.Function1 scala.Function2.tupled() public default boolean scala.Function2.apply$mcZDD$sp(double,double) public default double scala.Function2.apply$mcDDD$sp(double,double) public default float scala.Function2.apply$mcFDD$sp(double,double) public default int scala.Function2.apply$mcIDD$sp(double,double) public default long scala.Function2.apply$mcJDD$sp(double,double) public default void scala.Function2.apply$mcVDD$sp(double,double) public default boolean scala.Function2.apply$mcZDI$sp(double,int) public default double scala.Function2.apply$mcDDI$sp(double,int) public default float scala.Function2.apply$mcFDI$sp(double,int) public default int scala.Function2.apply$mcIDI$sp(double,int) public default long scala.Function2.apply$mcJDI$sp(double,int) public default void scala.Function2.apply$mcVDI$sp(double,int) public default boolean scala.Function2.apply$mcZDJ$sp(double,long) public default double scala.Function2.apply$mcDDJ$sp(double,long) public default float scala.Function2.apply$mcFDJ$sp(double,long) public default int scala.Function2.apply$mcIDJ$sp(double,long) public default long scala.Function2.apply$mcJDJ$sp(double,long) public default void scala.Function2.apply$mcVDJ$sp(double,long) public default boolean scala.Function2.apply$mcZID$sp(int,double) public default double scala.Function2.apply$mcDID$sp(int,double) public default float scala.Function2.apply$mcFID$sp(int,double) public default int scala.Function2.apply$mcIID$sp(int,double) public default long scala.Function2.apply$mcJID$sp(int,double) public default void scala.Function2.apply$mcVID$sp(int,double) public default boolean scala.Function2.apply$mcZII$sp(int,int) public default double scala.Function2.apply$mcDII$sp(int,int) public default float scala.Function2.apply$mcFII$sp(int,int) public default int scala.Function2.apply$mcIII$sp(int,int) public default long scala.Function2.apply$mcJII$sp(int,int) public default void scala.Function2.apply$mcVII$sp(int,int) public default boolean scala.Function2.apply$mcZIJ$sp(int,long) public default double scala.Function2.apply$mcDIJ$sp(int,long) public default float scala.Function2.apply$mcFIJ$sp(int,long) public default int scala.Function2.apply$mcIIJ$sp(int,long) public default long scala.Function2.apply$mcJIJ$sp(int,long) public default void scala.Function2.apply$mcVIJ$sp(int,long) public default boolean scala.Function2.apply$mcZJD$sp(long,double) public default double scala.Function2.apply$mcDJD$sp(long,double) public default float scala.Function2.apply$mcFJD$sp(long,double) public default int scala.Function2.apply$mcIJD$sp(long,double) public default long scala.Function2.apply$mcJJD$sp(long,double) public default void
[jira] [Comment Edited] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel
[ https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572094#comment-16572094 ] Stavros Kontopoulos edited comment on SPARK-25047 at 8/7/18 6:18 PM: - [~lrytz] thoughts? was (Author: skonto): [~lrytz] ideas? > Can't assign SerializedLambda to scala.Function1 in deserialization of > BucketedRandomProjectionLSHModel > --- > > Key: SPARK-25047 > URL: https://issues.apache.org/jira/browse/SPARK-25047 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Major > > Another distinct test failure: > {code:java} > - BucketedRandomProjectionLSH: streaming transform *** FAILED *** > org.apache.spark.sql.streaming.StreamingQueryException: Query [id = > 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = > 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted > due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent > failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): > java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > ... > Cause: java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233) > at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284) > ...{code} > Here the different nature of a Java 8 LMF closure trips of Java > serialization/deserialization. I think this can be patched by manually > implementing the Java serialization here, and don't see other instances (yet). > Also wondering if this "val" can be a "def". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572095#comment-16572095 ] Stavros Kontopoulos edited comment on SPARK-25044 at 8/7/18 6:18 PM: - @lrytz any insight? was (Author: skonto): @lrytz thoughts? > Address translation of LMF closure primitive args to Object in Scala 2.12 > - > > Key: SPARK-25044 > URL: https://issues.apache.org/jira/browse/SPARK-25044 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Major > > A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 > Fix HandleNullInputsForUDF rule". (Details in a sec when I can copy-paste > them.) > It seems that the closure that is fed in as a UDF changes behavior, in a way > that primitive-type arguments are handled differently. For example an Int > argument, when fed 'null', acts like 0. > I'm sure it's a difference in the LMF closure and how its types are > understood, but not exactly sure of the cause yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572095#comment-16572095 ] Stavros Kontopoulos commented on SPARK-25044: - @lrytz thoughts? > Address translation of LMF closure primitive args to Object in Scala 2.12 > - > > Key: SPARK-25044 > URL: https://issues.apache.org/jira/browse/SPARK-25044 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Major > > A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 > Fix HandleNullInputsForUDF rule". (Details in a sec when I can copy-paste > them.) > It seems that the closure that is fed in as a UDF changes behavior, in a way > that primitive-type arguments are handled differently. For example an Int > argument, when fed 'null', acts like 0. > I'm sure it's a difference in the LMF closure and how its types are > understood, but not exactly sure of the cause yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel
[ https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572094#comment-16572094 ] Stavros Kontopoulos commented on SPARK-25047: - [~lrytz] ideas? > Can't assign SerializedLambda to scala.Function1 in deserialization of > BucketedRandomProjectionLSHModel > --- > > Key: SPARK-25047 > URL: https://issues.apache.org/jira/browse/SPARK-25047 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Major > > Another distinct test failure: > {code:java} > - BucketedRandomProjectionLSH: streaming transform *** FAILED *** > org.apache.spark.sql.streaming.StreamingQueryException: Query [id = > 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = > 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted > due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent > failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): > java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > ... > Cause: java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233) > at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284) > ...{code} > Here the different nature of a Java 8 LMF closure trips of Java > serialization/deserialization. I think this can be patched by manually > implementing the Java serialization here, and don't see other instances (yet). > Also wondering if this "val" can be a "def". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25050) Handle more than two types in avro union types
DB Tsai created SPARK-25050: --- Summary: Handle more than two types in avro union types Key: SPARK-25050 URL: https://issues.apache.org/jira/browse/SPARK-25050 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: DB Tsai -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25049) Support custom schema in `to_avro`
DB Tsai created SPARK-25049: --- Summary: Support custom schema in `to_avro` Key: SPARK-25049 URL: https://issues.apache.org/jira/browse/SPARK-25049 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: DB Tsai Assignee: DB Tsai -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25029) Scala 2.12 issues: TaskNotSerializable and Janino "Two non-abstract methods ..." errors
[ https://issues.apache.org/jira/browse/SPARK-25029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572020#comment-16572020 ] shane knapp commented on SPARK-25029: - updated the build so concurrent runs can happen, albeit restricted to one build per ubuntu node. this should help built throughput significantly. > Scala 2.12 issues: TaskNotSerializable and Janino "Two non-abstract methods > ..." errors > --- > > Key: SPARK-25029 > URL: https://issues.apache.org/jira/browse/SPARK-25029 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Blocker > > We actually still have some test failures in the Scala 2.12 build. There seem > to be two types. First are that some tests fail with "TaskNotSerializable" > because some code construct now captures a reference to scalatest's > AssertionHelper. Example: > {code:java} > - LegacyAccumulatorWrapper with AccumulatorParam that has no equals/hashCode > *** FAILED *** java.io.NotSerializableException: > org.scalatest.Assertions$AssertionsHelper Serialization stack: - object not > serializable (class: org.scalatest.Assertions$AssertionsHelper, value: > org.scalatest.Assertions$AssertionsHelper@3bc5fc8f){code} > These seem generally easy to fix by tweaking the test code. It's not clear if > something about closure cleaning in 2.12 could be improved to detect this > situation automatically; given that yet only a handful of tests fail for this > reason, it's unlikely to be a systemic problem. > > The other error is curioser. Janino fails to compile generate code in many > cases with errors like: > {code:java} > - encode/decode for seq of string: List(abc, xyz) *** FAILED *** > java.lang.RuntimeException: Error while encoding: > org.codehaus.janino.InternalCompilerException: failed to compile: > org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass": > Two non-abstract methods "public int scala.collection.TraversableOnce.size()" > have the same parameter types, declaring type and return type{code} > > I include the full generated code that failed in one case below. There is no > {{size()}} in the generated code. It's got to be down to some difference in > Scala 2.12, potentially even a Janino problem. > > {code:java} > Caused by: org.codehaus.janino.InternalCompilerException: Compiling > "GeneratedClass": Two non-abstract methods "public int > scala.collection.TraversableOnce.size()" have the same parameter types, > declaring type and return type > at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:361) > at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:234) > at > org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:446) > at > org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:313) > at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:235) > at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:204) > at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1342) > ... 30 more > Caused by: org.codehaus.janino.InternalCompilerException: Two non-abstract > methods "public int scala.collection.TraversableOnce.size()" have the same > parameter types, declaring type and return type > at > org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:9112) > at > org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:) > at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:8770) > at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:8672) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4737) > at org.codehaus.janino.UnitCompiler.access$8300(UnitCompiler.java:212) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:4097) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:4070) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4902) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:4070) > at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:5253) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4391) > at org.codehaus.janino.UnitCompiler.access$8000(UnitCompiler.java:212) > at > org.codehaus.janino.UnitCompiler$12.visitConditionalExpression(UnitCompiler.java:4094) > at > org.codehaus.janino.UnitCompiler$12.visitConditionalExpression(UnitCompiler.java:4070) > at org.codehaus.janino.Java$ConditionalExpression.accept(Java.java:4344) >
[jira] [Commented] (SPARK-24598) SPARK SQL:Datatype overflow conditions gives incorrect result
[ https://issues.apache.org/jira/browse/SPARK-24598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572011#comment-16572011 ] Thomas Graves commented on SPARK-24598: --- In the very least we should file a separate Jira to track it going into 3.0 if you plan on fixing it there > SPARK SQL:Datatype overflow conditions gives incorrect result > - > > Key: SPARK-24598 > URL: https://issues.apache.org/jira/browse/SPARK-24598 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: navya >Assignee: Marco Gaido >Priority: Major > Fix For: 2.4.0 > > > Execute an sql query, so that it results in overflow conditions. > EX - SELECT 9223372036854775807 + 1 result = -9223372036854776000 > > Expected result - Error should be throw like mysql. > mysql> SELECT 9223372036854775807 + 1; > ERROR 1690 (22003): BIGINT value is out of range in '(9223372036854775807 + > 1)' -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23937) High-order function: map_filter(map, function) → MAP
[ https://issues.apache.org/jira/browse/SPARK-23937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-23937. --- Resolution: Fixed Assignee: Marco Gaido Fix Version/s: 2.4.0 Issue resolved by pull request 21986 https://github.com/apache/spark/pull/21986 > High-order function: map_filter(map, function) → MAP > -- > > Key: SPARK-23937 > URL: https://issues.apache.org/jira/browse/SPARK-23937 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Marco Gaido >Priority: Major > Fix For: 2.4.0 > > > Constructs a map from those entries of map for which function returns true: > {noformat} > SELECT map_filter(MAP(ARRAY[], ARRAY[]), (k, v) -> true); -- {} > SELECT map_filter(MAP(ARRAY[10, 20, 30], ARRAY['a', NULL, 'c']), (k, v) -> v > IS NOT NULL); -- {10 -> a, 30 -> c} > SELECT map_filter(MAP(ARRAY['k1', 'k2', 'k3'], ARRAY[20, 3, 15]), (k, v) -> v > > 10); -- {k1 -> 20, k3 -> 15} > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25041) genjavadoc-plugin_0.10 is not found with sbt in scala-2.12
[ https://issues.apache.org/jira/browse/SPARK-25041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-25041. --- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22020 [https://github.com/apache/spark/pull/22020] > genjavadoc-plugin_0.10 is not found with sbt in scala-2.12 > -- > > Key: SPARK-25041 > URL: https://issues.apache.org/jira/browse/SPARK-25041 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki >Priority: Major > Fix For: 2.4.0 > > > When the master is build with sbt in scala-2.12, the following error occurs: > {code} > [warn]module not found: > com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10 > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom > [warn] Maven2 Local: tried > [warn] > file:/gsa/jpngsa/home/i/s/ishizaki/.m2/repository/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom > [warn] local: tried > [warn] > /gsa/jpngsa/home/i/s/ishizaki/.ivy2/local/com.typesafe.genjavadoc/genjavadoc-plugin_2.12.6/0.10/ivys/ivy.xml > [info] Resolving jline#jline;2.14.3 ... > [warn]:: > [warn]:: UNRESOLVED DEPENDENCIES :: > [warn]:: > [warn]:: com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not > found > [warn]:: > [warn] > [warn]Note: Unresolved dependencies path: > [warn]com.typesafe.genjavadoc:genjavadoc-plugin_2.12.6:0.10 > (/home/ishizaki/Spark/PR/scala212/spark/project/SparkBuild.scala#L118) > [warn] +- org.apache.spark:spark-tags_2.12:2.4.0-SNAPSHOT > sbt.ResolveException: unresolved dependency: > com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not found > at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:320) > at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191) > at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168) > at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156) > at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156) > at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133) > at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57) > at sbt.IvySbt$$anon$4.call(Ivy.scala:65) > at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93) > at > xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78) > at > xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97) > at xsbt.boot.Using$.withResource(Using.scala:10) > at xsbt.boot.Using$.apply(Using.scala:9) > at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58) > at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48) > at xsbt.boot.Locks$.apply0(Locks.scala:31) > at xsbt.boot.Locks$.apply(Locks.scala:28) > at sbt.IvySbt.withDefaultLogger(Ivy.scala:65) > at sbt.IvySbt.withIvy(Ivy.scala:128) > at sbt.IvySbt.withIvy(Ivy.scala:125) > at sbt.IvySbt$Module.withModule(Ivy.scala:156) > at sbt.IvyActions$.updateEither(IvyActions.scala:168) > at > sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1555) > at > sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1551) > at > sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1586) > at > sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1584) > at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:37) > at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1589) > at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1583) > at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:60) > at sbt.Classpaths$.cachedUpdate(Defaults.scala:1606) > at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1533) > at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1485) > at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) > at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40) > at sbt.std.Transform$$anon$4.work(System.scala:63) > at > sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) > at > sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) > at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17) >
[jira] [Assigned] (SPARK-25041) genjavadoc-plugin_0.10 is not found with sbt in scala-2.12
[ https://issues.apache.org/jira/browse/SPARK-25041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-25041: - Assignee: Kazuaki Ishizaki > genjavadoc-plugin_0.10 is not found with sbt in scala-2.12 > -- > > Key: SPARK-25041 > URL: https://issues.apache.org/jira/browse/SPARK-25041 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki >Priority: Major > Fix For: 2.4.0 > > > When the master is build with sbt in scala-2.12, the following error occurs: > {code} > [warn]module not found: > com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10 > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom > [warn] Maven2 Local: tried > [warn] > file:/gsa/jpngsa/home/i/s/ishizaki/.m2/repository/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom > [warn] local: tried > [warn] > /gsa/jpngsa/home/i/s/ishizaki/.ivy2/local/com.typesafe.genjavadoc/genjavadoc-plugin_2.12.6/0.10/ivys/ivy.xml > [info] Resolving jline#jline;2.14.3 ... > [warn]:: > [warn]:: UNRESOLVED DEPENDENCIES :: > [warn]:: > [warn]:: com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not > found > [warn]:: > [warn] > [warn]Note: Unresolved dependencies path: > [warn]com.typesafe.genjavadoc:genjavadoc-plugin_2.12.6:0.10 > (/home/ishizaki/Spark/PR/scala212/spark/project/SparkBuild.scala#L118) > [warn] +- org.apache.spark:spark-tags_2.12:2.4.0-SNAPSHOT > sbt.ResolveException: unresolved dependency: > com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not found > at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:320) > at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191) > at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168) > at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156) > at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156) > at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133) > at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57) > at sbt.IvySbt$$anon$4.call(Ivy.scala:65) > at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93) > at > xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78) > at > xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97) > at xsbt.boot.Using$.withResource(Using.scala:10) > at xsbt.boot.Using$.apply(Using.scala:9) > at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58) > at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48) > at xsbt.boot.Locks$.apply0(Locks.scala:31) > at xsbt.boot.Locks$.apply(Locks.scala:28) > at sbt.IvySbt.withDefaultLogger(Ivy.scala:65) > at sbt.IvySbt.withIvy(Ivy.scala:128) > at sbt.IvySbt.withIvy(Ivy.scala:125) > at sbt.IvySbt$Module.withModule(Ivy.scala:156) > at sbt.IvyActions$.updateEither(IvyActions.scala:168) > at > sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1555) > at > sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1551) > at > sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1586) > at > sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1584) > at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:37) > at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1589) > at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1583) > at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:60) > at sbt.Classpaths$.cachedUpdate(Defaults.scala:1606) > at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1533) > at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1485) > at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) > at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40) > at sbt.std.Transform$$anon$4.work(System.scala:63) > at > sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) > at > sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) > at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17) > at sbt.Execute.work(Execute.scala:237) > at
[jira] [Updated] (SPARK-25048) Pivoting by multiple columns in Scala/Java
[ https://issues.apache.org/jira/browse/SPARK-25048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-25048: --- Summary: Pivoting by multiple columns in Scala/Java (was: Pivoting by multiple columns) > Pivoting by multiple columns in Scala/Java > -- > > Key: SPARK-25048 > URL: https://issues.apache.org/jira/browse/SPARK-25048 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Priority: Minor > > Need to change or extend existing API to make pivoting by multiple columns > possible. Users should be able to use many columns and values like in the > example: > {code:scala} > trainingSales > .groupBy($"sales.year") > .pivot(struct(lower($"sales.course"), $"training"), Seq( > struct(lit("dotnet"), lit("Experts")), > struct(lit("java"), lit("Dummies"))) > ).agg(sum($"sales.earnings")) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25048) Pivoting by multiple columns
[ https://issues.apache.org/jira/browse/SPARK-25048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571952#comment-16571952 ] Apache Spark commented on SPARK-25048: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/22030 > Pivoting by multiple columns > > > Key: SPARK-25048 > URL: https://issues.apache.org/jira/browse/SPARK-25048 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Priority: Minor > > Need to change or extend existing API to make pivoting by multiple columns > possible. Users should be able to use many columns and values like in the > example: > {code:scala} > trainingSales > .groupBy($"sales.year") > .pivot(struct(lower($"sales.course"), $"training"), Seq( > struct(lit("dotnet"), lit("Experts")), > struct(lit("java"), lit("Dummies"))) > ).agg(sum($"sales.earnings")) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25048) Pivoting by multiple columns
[ https://issues.apache.org/jira/browse/SPARK-25048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25048: Assignee: Apache Spark > Pivoting by multiple columns > > > Key: SPARK-25048 > URL: https://issues.apache.org/jira/browse/SPARK-25048 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Minor > > Need to change or extend existing API to make pivoting by multiple columns > possible. Users should be able to use many columns and values like in the > example: > {code:scala} > trainingSales > .groupBy($"sales.year") > .pivot(struct(lower($"sales.course"), $"training"), Seq( > struct(lit("dotnet"), lit("Experts")), > struct(lit("java"), lit("Dummies"))) > ).agg(sum($"sales.earnings")) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25048) Pivoting by multiple columns
[ https://issues.apache.org/jira/browse/SPARK-25048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25048: Assignee: (was: Apache Spark) > Pivoting by multiple columns > > > Key: SPARK-25048 > URL: https://issues.apache.org/jira/browse/SPARK-25048 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Priority: Minor > > Need to change or extend existing API to make pivoting by multiple columns > possible. Users should be able to use many columns and values like in the > example: > {code:scala} > trainingSales > .groupBy($"sales.year") > .pivot(struct(lower($"sales.course"), $"training"), Seq( > struct(lit("dotnet"), lit("Experts")), > struct(lit("java"), lit("Dummies"))) > ).agg(sum($"sales.earnings")) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25048) Pivoting by multiple columns
Maxim Gekk created SPARK-25048: -- Summary: Pivoting by multiple columns Key: SPARK-25048 URL: https://issues.apache.org/jira/browse/SPARK-25048 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.1 Reporter: Maxim Gekk Need to change or extend existing API to make pivoting by multiple columns possible. Users should be able to use many columns and values like in the example: {code:scala} trainingSales .groupBy($"sales.year") .pivot(struct(lower($"sales.course"), $"training"), Seq( struct(lit("dotnet"), lit("Experts")), struct(lit("java"), lit("Dummies"))) ).agg(sum($"sales.earnings")) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24924) Add mapping for built-in Avro data source
[ https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571908#comment-16571908 ] Thomas Graves commented on SPARK-24924: --- so originally when I started on this I didn't know about the side affects of the hive table here. So this isn't as straight forward as I originally thought. I still personally don't like remapping this because users get something other then what they explicitly asked for, but if we want to keep this compatibility we either have to do that or actually have a com.databricks.avro class that would just map into our internal avro. That would give the benefit that they could eclipse it with their own jar if they wanted to keep using their customer version, I assume we could theoretically also support the spark.read.avro format as well. Or I guess the third option is to just break compatibility and require the users to change the table property, but then they can't read it with older versions of spark. It also seems bad to me that we aren't supporting spark.read.avro, so its an api compatibility issue. We magically help them with compatibility with their tables by mapping them but we don't support the old api and they have to update your code. This feels like an inconsistent story to me and not sure how that fits with our versioning policy since its a 3rd party thing. Not sure I like any of these options. Seems like these are the options: 1)I wonder if we actually add the class com.databricks.avro into the spark source that does the remap and support spark.read/write.avro for a couple releases for compatibility, then remove it and tell people to change the table property or provide an api to do that. 2) make the mapping of com.databricks.avro => internal avro configurable, that would allow them to continue use their version of com.databricks.avro until they can update api. 3) do nothing, leave this as is with this jira and user has to deal with losing spark.read.avro api and possible confusion and breaking if they are using modified version of com.databricks.avro thoughts from others? > Add mapping for built-in Avro data source > - > > Key: SPARK-24924 > URL: https://issues.apache.org/jira/browse/SPARK-24924 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 2.4.0 > > > This issue aims to the followings. > # Like `com.databricks.spark.csv` mapping, we had better map > `com.databricks.spark.avro` to built-in Avro data source. > # Remove incorrect error message, `Please find an Avro package at ...`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24395) Fix Behavior of NOT IN with Literals Containing NULL
[ https://issues.apache.org/jira/browse/SPARK-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24395: Assignee: Apache Spark > Fix Behavior of NOT IN with Literals Containing NULL > > > Key: SPARK-24395 > URL: https://issues.apache.org/jira/browse/SPARK-24395 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Miles Yucht >Assignee: Apache Spark >Priority: Major > > Spark does not return the correct answer when evaluating NOT IN in some > cases. For example: > {code:java} > CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES > (null, null) > AS m(a, b); > SELECT * > FROM m > WHERE a IS NULL AND b IS NULL >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1;{code} > According to the semantics of null-aware anti-join, this should return no > rows. However, it actually returns the row {{NULL NULL}}. This was found by > inspecting the unit tests added for SPARK-24381 > ([https://github.com/apache/spark/pull/21425#pullrequestreview-123421822).] > *Acceptance Criteria*: > * We should be able to add the following test cases back to > {{subquery/in-subquery/not-in-unit-test-multi-column-literal.sql}}: > {code:java} > -- Case 2 > -- (subquery contains a row with null in all columns -> row not returned) > SELECT * > FROM m > WHERE (a, b) NOT IN ((CAST (null AS INT), CAST (null AS DECIMAL(2, 1; > -- Case 3 > -- (probe-side columns are all null -> row not returned) > SELECT * > FROM m > WHERE a IS NULL AND b IS NULL -- Matches only (null, null) >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1; > -- Case 4 > -- (one column null, other column matches a row in the subquery result -> > row not returned) > SELECT * > FROM m > WHERE b = 1.0 -- Matches (null, 1.0) >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1; > {code} > > cc [~smilegator] [~juliuszsompolski] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24395) Fix Behavior of NOT IN with Literals Containing NULL
[ https://issues.apache.org/jira/browse/SPARK-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571892#comment-16571892 ] Apache Spark commented on SPARK-24395: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/22029 > Fix Behavior of NOT IN with Literals Containing NULL > > > Key: SPARK-24395 > URL: https://issues.apache.org/jira/browse/SPARK-24395 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Miles Yucht >Priority: Major > > Spark does not return the correct answer when evaluating NOT IN in some > cases. For example: > {code:java} > CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES > (null, null) > AS m(a, b); > SELECT * > FROM m > WHERE a IS NULL AND b IS NULL >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1;{code} > According to the semantics of null-aware anti-join, this should return no > rows. However, it actually returns the row {{NULL NULL}}. This was found by > inspecting the unit tests added for SPARK-24381 > ([https://github.com/apache/spark/pull/21425#pullrequestreview-123421822).] > *Acceptance Criteria*: > * We should be able to add the following test cases back to > {{subquery/in-subquery/not-in-unit-test-multi-column-literal.sql}}: > {code:java} > -- Case 2 > -- (subquery contains a row with null in all columns -> row not returned) > SELECT * > FROM m > WHERE (a, b) NOT IN ((CAST (null AS INT), CAST (null AS DECIMAL(2, 1; > -- Case 3 > -- (probe-side columns are all null -> row not returned) > SELECT * > FROM m > WHERE a IS NULL AND b IS NULL -- Matches only (null, null) >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1; > -- Case 4 > -- (one column null, other column matches a row in the subquery result -> > row not returned) > SELECT * > FROM m > WHERE b = 1.0 -- Matches (null, 1.0) >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1; > {code} > > cc [~smilegator] [~juliuszsompolski] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24395) Fix Behavior of NOT IN with Literals Containing NULL
[ https://issues.apache.org/jira/browse/SPARK-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24395: Assignee: (was: Apache Spark) > Fix Behavior of NOT IN with Literals Containing NULL > > > Key: SPARK-24395 > URL: https://issues.apache.org/jira/browse/SPARK-24395 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Miles Yucht >Priority: Major > > Spark does not return the correct answer when evaluating NOT IN in some > cases. For example: > {code:java} > CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES > (null, null) > AS m(a, b); > SELECT * > FROM m > WHERE a IS NULL AND b IS NULL >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1;{code} > According to the semantics of null-aware anti-join, this should return no > rows. However, it actually returns the row {{NULL NULL}}. This was found by > inspecting the unit tests added for SPARK-24381 > ([https://github.com/apache/spark/pull/21425#pullrequestreview-123421822).] > *Acceptance Criteria*: > * We should be able to add the following test cases back to > {{subquery/in-subquery/not-in-unit-test-multi-column-literal.sql}}: > {code:java} > -- Case 2 > -- (subquery contains a row with null in all columns -> row not returned) > SELECT * > FROM m > WHERE (a, b) NOT IN ((CAST (null AS INT), CAST (null AS DECIMAL(2, 1; > -- Case 3 > -- (probe-side columns are all null -> row not returned) > SELECT * > FROM m > WHERE a IS NULL AND b IS NULL -- Matches only (null, null) >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1; > -- Case 4 > -- (one column null, other column matches a row in the subquery result -> > row not returned) > SELECT * > FROM m > WHERE b = 1.0 -- Matches (null, 1.0) >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1; > {code} > > cc [~smilegator] [~juliuszsompolski] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel
Sean Owen created SPARK-25047: - Summary: Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel Key: SPARK-25047 URL: https://issues.apache.org/jira/browse/SPARK-25047 Project: Spark Issue Type: Sub-task Components: ML Affects Versions: 2.4.0 Reporter: Sean Owen Another distinct test failure: {code:java} - BucketedRandomProjectionLSH: streaming transform *** FAILED *** org.apache.spark.sql.streaming.StreamingQueryException: Query [id = 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of type scala.Function1 in instance of org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel ... Cause: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of type scala.Function1 in instance of org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284) ...{code} Here the different nature of a Java 8 LMF closure trips of Java serialization/deserialization. I think this can be patched by manually implementing the Java serialization here, and don't see other instances (yet). Also wondering if this "val" can be a "def". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24979) add AnalysisHelper#resolveOperatorsUp
[ https://issues.apache.org/jira/browse/SPARK-24979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24979. - Resolution: Fixed Fix Version/s: 2.4.0 > add AnalysisHelper#resolveOperatorsUp > - > > Key: SPARK-24979 > URL: https://issues.apache.org/jira/browse/SPARK-24979 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 2.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25037) plan.transformAllExpressions() doesn't transform expressions in nested SubqueryExpression plans
[ https://issues.apache.org/jira/browse/SPARK-25037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571855#comment-16571855 ] Dilip Biswal commented on SPARK-25037: -- [~chriso] [~hyukjin.kwon] Subquery plans are not part of parent plans transformAllExpressions. Its been like this from the beginning of subquery support in spark. Just a FYI. > plan.transformAllExpressions() doesn't transform expressions in nested > SubqueryExpression plans > --- > > Key: SPARK-25037 > URL: https://issues.apache.org/jira/browse/SPARK-25037 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Chris O'Hara >Priority: Minor > > Given the following LogicalPlan: > {code:java} > scala> val plan = spark.sql("SELECT 1 bar FROM (SELECT 1 foo) WHERE foo IN > (SELECT 1 foo)").queryExecution.logical > plan: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = > 'Project [1 AS bar#29] > +- 'Filter 'foo IN (list#31 []) > : +- Project [1 AS foo#30] > : +- OneRowRelation > +- SubqueryAlias __auto_generated_subquery_name > +- Project [1 AS foo#28] > +- OneRowRelation > {code} > the following transformation should replace all instances of lit(1) with > lit(2): > {code:java} > scala> plan.transformAllExpressions { case l @ Literal(1, _) => l.copy(value > = 2) } > res0: plan.type = > 'Project [2 AS bar#29] > +- 'Filter 'foo IN (list#31 []) > : +- Project [1 AS foo#30] > : +- OneRowRelation > +- SubqueryAlias __auto_generated_subquery_name > +- Project [2 AS foo#28] > +- OneRowRelation > {code} > Instead, the nested SubqueryExpression plan is not transformed. > The expected output is: > {code:java} > 'Project [2 AS bar#29] > +- 'Filter 'foo IN (list#31 []) > : +- Project [2 AS foo#30] > : +- OneRowRelation > +- SubqueryAlias __auto_generated_subquery_name > +- Project [2 AS foo#28] > +- OneRowRelation > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24924) Add mapping for built-in Avro data source
[ https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571852#comment-16571852 ] Thomas Graves commented on SPARK-24924: --- thanks, I missed it in the output for spark as I was just looking at table properties. So what you are saying is that without this change to map databricks avro to our internal avro, the only way to update hive tables to use the internal avro version is to have them manually set the table properties? Do you know off hand if you are able to write to a hive table with datasource "com.databricks.spark.avro" using the internal avro version or does it error? > Add mapping for built-in Avro data source > - > > Key: SPARK-24924 > URL: https://issues.apache.org/jira/browse/SPARK-24924 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 2.4.0 > > > This issue aims to the followings. > # Like `com.databricks.spark.csv` mapping, we had better map > `com.databricks.spark.avro` to built-in Avro data source. > # Remove incorrect error message, `Please find an Avro package at ...`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25046) Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO"
[ https://issues.apache.org/jira/browse/SPARK-25046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25046: Assignee: Apache Spark > Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" > - > > Key: SPARK-25046 > URL: https://issues.apache.org/jira/browse/SPARK-25046 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: SongXun >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > > Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" . We should > throw > ParseException(s"Operation not allowed: $message", ctx) as Create View does. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25046) Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO"
[ https://issues.apache.org/jira/browse/SPARK-25046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25046: Assignee: (was: Apache Spark) > Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" > - > > Key: SPARK-25046 > URL: https://issues.apache.org/jira/browse/SPARK-25046 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: SongXun >Priority: Minor > Labels: pull-request-available > > Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" . We should > throw > ParseException(s"Operation not allowed: $message", ctx) as Create View does. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25046) Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO"
[ https://issues.apache.org/jira/browse/SPARK-25046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571842#comment-16571842 ] Apache Spark commented on SPARK-25046: -- User 'sddyljsx' has created a pull request for this issue: https://github.com/apache/spark/pull/22028 > Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" > - > > Key: SPARK-25046 > URL: https://issues.apache.org/jira/browse/SPARK-25046 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: SongXun >Priority: Minor > Labels: pull-request-available > > Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" . We should > throw > ParseException(s"Operation not allowed: $message", ctx) as Create View does. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25010) Rand/Randn should produce different values for each execution in streaming query
[ https://issues.apache.org/jira/browse/SPARK-25010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571834#comment-16571834 ] Apache Spark commented on SPARK-25010: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/22027 > Rand/Randn should produce different values for each execution in streaming > query > > > Key: SPARK-25010 > URL: https://issues.apache.org/jira/browse/SPARK-25010 > Project: Spark > Issue Type: Bug > Components: SQL, Structured Streaming >Affects Versions: 2.4.0 >Reporter: Liang-Chi Hsieh >Assignee: Liang-Chi Hsieh >Priority: Major > Fix For: 2.4.0 > > > Like Uuid in SPARK-24896, Rand and Randn expressions now produce the same > results for each execution in streaming query. It doesn't make too much sense > for streaming queries. We should make them produce different results as Uuid. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25046) Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO"
SongXun created SPARK-25046: --- Summary: Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" Key: SPARK-25046 URL: https://issues.apache.org/jira/browse/SPARK-25046 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: SongXun Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" . We should throw ParseException(s"Operation not allowed: $message", ctx) as Create View does. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25045) Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`
[ https://issues.apache.org/jira/browse/SPARK-25045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25045: Assignee: (was: Apache Spark) > Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions` > --- > > Key: SPARK-25045 > URL: https://issues.apache.org/jira/browse/SPARK-25045 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Jiang Xingbo >Priority: Major > > Signature of the function passed to `RDDBarrier.mapPartitions()` is different > from that of `RDD.mapPartitions`. The latter doesn’t take a TaskContext. We > shall make the function signature the same to avoid confusion and misusage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25045) Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`
[ https://issues.apache.org/jira/browse/SPARK-25045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25045: Assignee: Apache Spark > Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions` > --- > > Key: SPARK-25045 > URL: https://issues.apache.org/jira/browse/SPARK-25045 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Jiang Xingbo >Assignee: Apache Spark >Priority: Major > > Signature of the function passed to `RDDBarrier.mapPartitions()` is different > from that of `RDD.mapPartitions`. The latter doesn’t take a TaskContext. We > shall make the function signature the same to avoid confusion and misusage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25045) Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`
[ https://issues.apache.org/jira/browse/SPARK-25045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571804#comment-16571804 ] Apache Spark commented on SPARK-25045: -- User 'jiangxb1987' has created a pull request for this issue: https://github.com/apache/spark/pull/22026 > Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions` > --- > > Key: SPARK-25045 > URL: https://issues.apache.org/jira/browse/SPARK-25045 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Jiang Xingbo >Priority: Major > > Signature of the function passed to `RDDBarrier.mapPartitions()` is different > from that of `RDD.mapPartitions`. The latter doesn’t take a TaskContext. We > shall make the function signature the same to avoid confusion and misusage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571793#comment-16571793 ] Imran Rashid commented on SPARK-24918: -- [~lucacanali] you could certainly sample stack traces, but the current proposal doesn't cover communication with the driver at all. IMO that is too much complexity for v1. Did you have a design in mind for that? You could use the executor plugin to build your own communication between the driver and executors, but depending on what you want, might be tricky. Do you think you could setup the configuration you need statically, when the application starts? Eg. i had run a test to take stack traces anytime a task was running over some configurable time -- then I just needed task start & end events in my plugin. > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25045) Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`
Jiang Xingbo created SPARK-25045: Summary: Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions` Key: SPARK-25045 URL: https://issues.apache.org/jira/browse/SPARK-25045 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.4.0 Reporter: Jiang Xingbo Signature of the function passed to `RDDBarrier.mapPartitions()` is different from that of `RDD.mapPartitions`. The latter doesn’t take a TaskContext. We shall make the function signature the same to avoid confusion and misusage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-25044: -- Description: A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 Fix HandleNullInputsForUDF rule". (Details in a sec when I can copy-paste them.) It seems that the closure that is fed in as a UDF changes behavior, in a way that primitive-type arguments are handled differently. For example an Int argument, when fed 'null', acts like 0. I'm sure it's a difference in the LMF closure and how its types are understood, but not exactly sure of the cause yet. > Address translation of LMF closure primitive args to Object in Scala 2.12 > - > > Key: SPARK-25044 > URL: https://issues.apache.org/jira/browse/SPARK-25044 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Major > > A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 > Fix HandleNullInputsForUDF rule". (Details in a sec when I can copy-paste > them.) > It seems that the closure that is fed in as a UDF changes behavior, in a way > that primitive-type arguments are handled differently. For example an Int > argument, when fed 'null', acts like 0. > I'm sure it's a difference in the LMF closure and how its types are > understood, but not exactly sure of the cause yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12
Sean Owen created SPARK-25044: - Summary: Address translation of LMF closure primitive args to Object in Scala 2.12 Key: SPARK-25044 URL: https://issues.apache.org/jira/browse/SPARK-25044 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 2.4.0 Reporter: Sean Owen -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-14220) Build and test Spark against Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reopened SPARK-14220: --- Assignee: Sean Owen OK, bad news, I think we still have several non-trivial issues with Scala 2.12 support: at least the janino compiler issue and a new one about how lambda metafactory closures seem to implement primitive args as reference types, which makes some SQL operations change semantics. I'm going to organize some open JIRAs here accordingly. > Build and test Spark against Scala 2.12 > --- > > Key: SPARK-14220 > URL: https://issues.apache.org/jira/browse/SPARK-14220 > Project: Spark > Issue Type: Umbrella > Components: Build, Project Infra >Affects Versions: 2.1.0 >Reporter: Josh Rosen >Assignee: Sean Owen >Priority: Blocker > Labels: release-notes > > This umbrella JIRA tracks the requirements for building and testing Spark > against the current Scala 2.12 milestone. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14220) Build and test Spark against Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-14220: -- Target Version/s: 2.4.0 Fix Version/s: (was: 2.4.0) > Build and test Spark against Scala 2.12 > --- > > Key: SPARK-14220 > URL: https://issues.apache.org/jira/browse/SPARK-14220 > Project: Spark > Issue Type: Umbrella > Components: Build, Project Infra >Affects Versions: 2.1.0 >Reporter: Josh Rosen >Assignee: Sean Owen >Priority: Blocker > Labels: release-notes > > This umbrella JIRA tracks the requirements for building and testing Spark > against the current Scala 2.12 milestone. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571748#comment-16571748 ] Luca Canali commented on SPARK-24918: - I have a use case where I would like to sample stack traces of the Spark executors across the cluster and later aggregate the data into a Flame Graph. I may want to do data collection only for a short duration (due to the overhead) and possibly be able to start and stop data collection at will from the driver. Similar use cases would be to deploy "probes" using tools for dynamic tracing to measure specific details of the workload. I think the executor plugin would be useful for this. In additional it would be nice to have a mechanism to send and receive commands/data between the Spark driver and the plugin process. Would this proposal make sense in the context of this SPIP or would it add too much complexity to the original proposal? > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25043) spark-sql should print the appId and master on startup
[ https://issues.apache.org/jira/browse/SPARK-25043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25043: Assignee: Apache Spark > spark-sql should print the appId and master on startup > -- > > Key: SPARK-25043 > URL: https://issues.apache.org/jira/browse/SPARK-25043 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Alessandro Bellina >Assignee: Apache Spark >Priority: Trivial > > In spark-sql, if logging is turned down all the way, it's not possible to > find out what appId is running at the moment. This small change as a print to > stdout containing the master type and the appId to have that on screen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25043) spark-sql should print the appId and master on startup
[ https://issues.apache.org/jira/browse/SPARK-25043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25043: Assignee: (was: Apache Spark) > spark-sql should print the appId and master on startup > -- > > Key: SPARK-25043 > URL: https://issues.apache.org/jira/browse/SPARK-25043 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Alessandro Bellina >Priority: Trivial > > In spark-sql, if logging is turned down all the way, it's not possible to > find out what appId is running at the moment. This small change as a print to > stdout containing the master type and the appId to have that on screen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25043) spark-sql should print the appId and master on startup
[ https://issues.apache.org/jira/browse/SPARK-25043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571736#comment-16571736 ] Apache Spark commented on SPARK-25043: -- User 'abellina' has created a pull request for this issue: https://github.com/apache/spark/pull/22025 > spark-sql should print the appId and master on startup > -- > > Key: SPARK-25043 > URL: https://issues.apache.org/jira/browse/SPARK-25043 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Alessandro Bellina >Priority: Trivial > > In spark-sql, if logging is turned down all the way, it's not possible to > find out what appId is running at the moment. This small change as a print to > stdout containing the master type and the appId to have that on screen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25043) spark-sql should print the appId and master on startup
Alessandro Bellina created SPARK-25043: -- Summary: spark-sql should print the appId and master on startup Key: SPARK-25043 URL: https://issues.apache.org/jira/browse/SPARK-25043 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.1 Reporter: Alessandro Bellina In spark-sql, if logging is turned down all the way, it's not possible to find out what appId is running at the moment. This small change as a print to stdout containing the master type and the appId to have that on screen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19602) Unable to query using the fully qualified column name of the form ( ..)
[ https://issues.apache.org/jira/browse/SPARK-19602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-19602. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 17185 [https://github.com/apache/spark/pull/17185] > Unable to query using the fully qualified column name of the form ( > ..) > -- > > Key: SPARK-19602 > URL: https://issues.apache.org/jira/browse/SPARK-19602 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Sunitha Kambhampati >Assignee: Sunitha Kambhampati >Priority: Major > Fix For: 2.4.0 > > Attachments: Design_ColResolution_JIRA19602.pdf > > > 1) Spark SQL fails to analyze this query: select db1.t1.i1 from db1.t1, > db2.t1 > Most of the other database systems support this ( e.g DB2, Oracle, MySQL). > Note: In DB2, Oracle, the notion is of .. > 2) Another scenario where this fully qualified name is useful is as follows: > // current database is db1. > select t1.i1 from t1, db2.t1 > If the i1 column exists in both tables: db1.t1 and db2.t1, this will throw an > error during column resolution in the analyzer, as it is ambiguous. > Lets say the user intended to retrieve i1 from db1.t1 but in the example > db2.t1 only has i1 column. The query would still succeed instead of throwing > an error. > One way to avoid confusion would be to explicitly specify using the fully > qualified name db1.t1.i1 > For e.g: select db1.t1.i1 from t1, db2.t1 > Workarounds: > There is a workaround for these issues, which is to use an alias. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25003) Pyspark Does not use Spark Sql Extensions
[ https://issues.apache.org/jira/browse/SPARK-25003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571630#comment-16571630 ] Apache Spark commented on SPARK-25003: -- User 'RussellSpitzer' has created a pull request for this issue: https://github.com/apache/spark/pull/21988 > Pyspark Does not use Spark Sql Extensions > - > > Key: SPARK-25003 > URL: https://issues.apache.org/jira/browse/SPARK-25003 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.2.2, 2.3.1 >Reporter: Russell Spitzer >Priority: Major > > When creating a SparkSession here > [https://github.com/apache/spark/blob/v2.2.2/python/pyspark/sql/session.py#L216] > {code:python} > if jsparkSession is None: > jsparkSession = self._jvm.SparkSession(self._jsc.sc()) > self._jsparkSession = jsparkSession > {code} > I believe it ends up calling the constructor here > https://github.com/apache/spark/blob/v2.2.2/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L85-L87 > {code:scala} > private[sql] def this(sc: SparkContext) { > this(sc, None, None, new SparkSessionExtensions) > } > {code} > Which creates a new SparkSessionsExtensions object and does not pick up new > extensions that could have been set in the config like the companion > getOrCreate does. > https://github.com/apache/spark/blob/v2.2.2/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L928-L944 > {code:scala} > //in getOrCreate > // Initialize extensions if the user has defined a configurator class. > val extensionConfOption = > sparkContext.conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS) > if (extensionConfOption.isDefined) { > val extensionConfClassName = extensionConfOption.get > try { > val extensionConfClass = > Utils.classForName(extensionConfClassName) > val extensionConf = extensionConfClass.newInstance() > .asInstanceOf[SparkSessionExtensions => Unit] > extensionConf(extensions) > } catch { > // Ignore the error if we cannot find the class or when the class > has the wrong type. > case e @ (_: ClassCastException | > _: ClassNotFoundException | > _: NoClassDefFoundError) => > logWarning(s"Cannot use $extensionConfClassName to configure > session extensions.", e) > } > } > {code} > I think a quick fix would be to use the getOrCreate method from the companion > object instead of calling the constructor from the SparkContext. Or we could > fix this by ensuring that all constructors attempt to pick up custom > extensions if they are set. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25003) Pyspark Does not use Spark Sql Extensions
[ https://issues.apache.org/jira/browse/SPARK-25003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571631#comment-16571631 ] Apache Spark commented on SPARK-25003: -- User 'RussellSpitzer' has created a pull request for this issue: https://github.com/apache/spark/pull/21989 > Pyspark Does not use Spark Sql Extensions > - > > Key: SPARK-25003 > URL: https://issues.apache.org/jira/browse/SPARK-25003 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.2.2, 2.3.1 >Reporter: Russell Spitzer >Priority: Major > > When creating a SparkSession here > [https://github.com/apache/spark/blob/v2.2.2/python/pyspark/sql/session.py#L216] > {code:python} > if jsparkSession is None: > jsparkSession = self._jvm.SparkSession(self._jsc.sc()) > self._jsparkSession = jsparkSession > {code} > I believe it ends up calling the constructor here > https://github.com/apache/spark/blob/v2.2.2/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L85-L87 > {code:scala} > private[sql] def this(sc: SparkContext) { > this(sc, None, None, new SparkSessionExtensions) > } > {code} > Which creates a new SparkSessionsExtensions object and does not pick up new > extensions that could have been set in the config like the companion > getOrCreate does. > https://github.com/apache/spark/blob/v2.2.2/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L928-L944 > {code:scala} > //in getOrCreate > // Initialize extensions if the user has defined a configurator class. > val extensionConfOption = > sparkContext.conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS) > if (extensionConfOption.isDefined) { > val extensionConfClassName = extensionConfOption.get > try { > val extensionConfClass = > Utils.classForName(extensionConfClassName) > val extensionConf = extensionConfClass.newInstance() > .asInstanceOf[SparkSessionExtensions => Unit] > extensionConf(extensions) > } catch { > // Ignore the error if we cannot find the class or when the class > has the wrong type. > case e @ (_: ClassCastException | > _: ClassNotFoundException | > _: NoClassDefFoundError) => > logWarning(s"Cannot use $extensionConfClassName to configure > session extensions.", e) > } > } > {code} > I think a quick fix would be to use the getOrCreate method from the companion > object instead of calling the constructor from the SparkContext. Or we could > fix this by ensuring that all constructors attempt to pick up custom > extensions if they are set. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25034) possible triple memory consumption in fetchBlockSync()
[ https://issues.apache.org/jira/browse/SPARK-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25034: Assignee: Apache Spark > possible triple memory consumption in fetchBlockSync() > -- > > Key: SPARK-25034 > URL: https://issues.apache.org/jira/browse/SPARK-25034 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.2, 2.3.0, 2.4.0 >Reporter: Vincent >Assignee: Apache Spark >Priority: Major > > Hello > in the code of _fetchBlockSync_() in _blockTransferService_, we have: > > {code:java} > val ret = ByteBuffer.allocate(data.size.toInt) > ret.put(data.nioByteBuffer()) > ret.flip() > result.success(new NioManagedBuffer(ret)) > {code} > In some cases, the _data_ variable is a _NettyManagedBuffer_, whose > underlying netty representation is a _CompositeByteBuffer_. > Going through the code above in this configuration, assuming that the > variable _data_ holds N bytes: > 1) we allocate a full buffer of N bytes in _ret_ > 2) calling _data.nioByteBuffer()_ on a _CompositeByteBuffer_ will trigger a > full merge of all the composite buffers, which will allocate *again* a full > buffer of N bytes > 3) we copy to _ret_ the data byte by byte > This means that at some point the N bytes of data are located 3 times in > memory. > Is this really necessary? > It seems unclear to me why we have to process at all the data, given that we > receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_ > Is there something I'm missing here? It seems this whole operation could be > done with 0 copies. > The only upside here is that the new buffer will have merged all the > composite buffer's arrays, but it is really not clear if this is intended. In > any case this could be done with peak memory of 2N and not 3N > Cheers! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25034) possible triple memory consumption in fetchBlockSync()
[ https://issues.apache.org/jira/browse/SPARK-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571626#comment-16571626 ] Apache Spark commented on SPARK-25034: -- User 'vincent-grosbois' has created a pull request for this issue: https://github.com/apache/spark/pull/22024 > possible triple memory consumption in fetchBlockSync() > -- > > Key: SPARK-25034 > URL: https://issues.apache.org/jira/browse/SPARK-25034 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.2, 2.3.0, 2.4.0 >Reporter: Vincent >Priority: Major > > Hello > in the code of _fetchBlockSync_() in _blockTransferService_, we have: > > {code:java} > val ret = ByteBuffer.allocate(data.size.toInt) > ret.put(data.nioByteBuffer()) > ret.flip() > result.success(new NioManagedBuffer(ret)) > {code} > In some cases, the _data_ variable is a _NettyManagedBuffer_, whose > underlying netty representation is a _CompositeByteBuffer_. > Going through the code above in this configuration, assuming that the > variable _data_ holds N bytes: > 1) we allocate a full buffer of N bytes in _ret_ > 2) calling _data.nioByteBuffer()_ on a _CompositeByteBuffer_ will trigger a > full merge of all the composite buffers, which will allocate *again* a full > buffer of N bytes > 3) we copy to _ret_ the data byte by byte > This means that at some point the N bytes of data are located 3 times in > memory. > Is this really necessary? > It seems unclear to me why we have to process at all the data, given that we > receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_ > Is there something I'm missing here? It seems this whole operation could be > done with 0 copies. > The only upside here is that the new buffer will have merged all the > composite buffer's arrays, but it is really not clear if this is intended. In > any case this could be done with peak memory of 2N and not 3N > Cheers! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25034) possible triple memory consumption in fetchBlockSync()
[ https://issues.apache.org/jira/browse/SPARK-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25034: Assignee: (was: Apache Spark) > possible triple memory consumption in fetchBlockSync() > -- > > Key: SPARK-25034 > URL: https://issues.apache.org/jira/browse/SPARK-25034 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.2, 2.3.0, 2.4.0 >Reporter: Vincent >Priority: Major > > Hello > in the code of _fetchBlockSync_() in _blockTransferService_, we have: > > {code:java} > val ret = ByteBuffer.allocate(data.size.toInt) > ret.put(data.nioByteBuffer()) > ret.flip() > result.success(new NioManagedBuffer(ret)) > {code} > In some cases, the _data_ variable is a _NettyManagedBuffer_, whose > underlying netty representation is a _CompositeByteBuffer_. > Going through the code above in this configuration, assuming that the > variable _data_ holds N bytes: > 1) we allocate a full buffer of N bytes in _ret_ > 2) calling _data.nioByteBuffer()_ on a _CompositeByteBuffer_ will trigger a > full merge of all the composite buffers, which will allocate *again* a full > buffer of N bytes > 3) we copy to _ret_ the data byte by byte > This means that at some point the N bytes of data are located 3 times in > memory. > Is this really necessary? > It seems unclear to me why we have to process at all the data, given that we > receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_ > Is there something I'm missing here? It seems this whole operation could be > done with 0 copies. > The only upside here is that the new buffer will have merged all the > composite buffer's arrays, but it is really not clear if this is intended. In > any case this could be done with peak memory of 2N and not 3N > Cheers! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23928) High-order function: shuffle(x) → array
[ https://issues.apache.org/jira/browse/SPARK-23928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571421#comment-16571421 ] Apache Spark commented on SPARK-23928: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/22023 > High-order function: shuffle(x) → array > --- > > Key: SPARK-23928 > URL: https://issues.apache.org/jira/browse/SPARK-23928 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: H Lu >Priority: Major > Fix For: 2.4.0 > > > Ref: https://prestodb.io/docs/current/functions/array.html > Generate a random permutation of the given array x. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25042) Flaky test: org.apache.spark.streaming.kafka010.KafkaRDDSuite.compacted topic
Marco Gaido created SPARK-25042: --- Summary: Flaky test: org.apache.spark.streaming.kafka010.KafkaRDDSuite.compacted topic Key: SPARK-25042 URL: https://issues.apache.org/jira/browse/SPARK-25042 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 2.4.0 Reporter: Marco Gaido The test {{compacted topic}} in {{org.apache.spark.streaming.kafka010.KafkaRDDSuite}} is flaky: it failed in an unrelated PR: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94293/testReport/. And it passes locally on the same branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24772) support reading AVRO logical types - Decimal
[ https://issues.apache.org/jira/browse/SPARK-24772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-24772: --- Assignee: Gengliang Wang > support reading AVRO logical types - Decimal > > > Key: SPARK-24772 > URL: https://issues.apache.org/jira/browse/SPARK-24772 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 2.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24772) support reading AVRO logical types - Decimal
[ https://issues.apache.org/jira/browse/SPARK-24772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-24772. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21984 [https://github.com/apache/spark/pull/21984] > support reading AVRO logical types - Decimal > > > Key: SPARK-24772 > URL: https://issues.apache.org/jira/browse/SPARK-24772 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 2.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24005) Remove usage of Scala’s parallel collection
[ https://issues.apache.org/jira/browse/SPARK-24005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-24005: --- Assignee: Maxim Gekk > Remove usage of Scala’s parallel collection > --- > > Key: SPARK-24005 > URL: https://issues.apache.org/jira/browse/SPARK-24005 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Maxim Gekk >Priority: Major > Labels: starter > Fix For: 2.4.0 > > > {noformat} > val par = (1 to 100).par.flatMap { i => > Thread.sleep(1000) > 1 to 1000 > }.toSeq > {noformat} > We are unable to interrupt the execution of parallel collections. We need to > create a common utility function to do it, instead of using Scala parallel > collections -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24005) Remove usage of Scala’s parallel collection
[ https://issues.apache.org/jira/browse/SPARK-24005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-24005. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21913 [https://github.com/apache/spark/pull/21913] > Remove usage of Scala’s parallel collection > --- > > Key: SPARK-24005 > URL: https://issues.apache.org/jira/browse/SPARK-24005 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Maxim Gekk >Priority: Major > Labels: starter > Fix For: 2.4.0 > > > {noformat} > val par = (1 to 100).par.flatMap { i => > Thread.sleep(1000) > 1 to 1000 > }.toSeq > {noformat} > We are unable to interrupt the execution of parallel collections. We need to > create a common utility function to do it, instead of using Scala parallel > collections -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24948) SHS filters wrongly some applications due to permission check
[ https://issues.apache.org/jira/browse/SPARK-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571294#comment-16571294 ] Apache Spark commented on SPARK-24948: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/22022 > SHS filters wrongly some applications due to permission check > - > > Key: SPARK-24948 > URL: https://issues.apache.org/jira/browse/SPARK-24948 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: Marco Gaido >Priority: Blocker > Fix For: 2.4.0 > > > SHS filters the event logs it doesn't have permissions to read. > Unfortunately, this check is quite naive, as it takes into account only the > base permissions (ie. user, group, other permissions). For instance, if ACL > are enabled, they are ignored in this check; moreover, each filesystem may > have different policies (eg. they can consider spark as a superuser who can > access everything). > This results in some applications not being displayed in the SHS, despite the > Spark user (or whatever user the SHS is started with) can actually read their > ent logs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24948) SHS filters wrongly some applications due to permission check
[ https://issues.apache.org/jira/browse/SPARK-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571260#comment-16571260 ] Apache Spark commented on SPARK-24948: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/22021 > SHS filters wrongly some applications due to permission check > - > > Key: SPARK-24948 > URL: https://issues.apache.org/jira/browse/SPARK-24948 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: Marco Gaido >Priority: Blocker > Fix For: 2.4.0 > > > SHS filters the event logs it doesn't have permissions to read. > Unfortunately, this check is quite naive, as it takes into account only the > base permissions (ie. user, group, other permissions). For instance, if ACL > are enabled, they are ignored in this check; moreover, each filesystem may > have different policies (eg. they can consider spark as a superuser who can > access everything). > This results in some applications not being displayed in the SHS, despite the > Spark user (or whatever user the SHS is started with) can actually read their > ent logs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25041) genjavadoc-plugin_0.10 is not found with sbt in scala-2.12
[ https://issues.apache.org/jira/browse/SPARK-25041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-25041: - Summary: genjavadoc-plugin_0.10 is not found with sbt in scala-2.12 (was: genjavadoc-plugin_2.12.6 is not found with sbt in scala-2.12) > genjavadoc-plugin_0.10 is not found with sbt in scala-2.12 > -- > > Key: SPARK-25041 > URL: https://issues.apache.org/jira/browse/SPARK-25041 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Kazuaki Ishizaki >Priority: Major > > When the master is build with sbt in scala-2.12, the following error occurs: > {code} > [warn]module not found: > com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10 > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom > [warn] Maven2 Local: tried > [warn] > file:/gsa/jpngsa/home/i/s/ishizaki/.m2/repository/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom > [warn] local: tried > [warn] > /gsa/jpngsa/home/i/s/ishizaki/.ivy2/local/com.typesafe.genjavadoc/genjavadoc-plugin_2.12.6/0.10/ivys/ivy.xml > [info] Resolving jline#jline;2.14.3 ... > [warn]:: > [warn]:: UNRESOLVED DEPENDENCIES :: > [warn]:: > [warn]:: com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not > found > [warn]:: > [warn] > [warn]Note: Unresolved dependencies path: > [warn]com.typesafe.genjavadoc:genjavadoc-plugin_2.12.6:0.10 > (/home/ishizaki/Spark/PR/scala212/spark/project/SparkBuild.scala#L118) > [warn] +- org.apache.spark:spark-tags_2.12:2.4.0-SNAPSHOT > sbt.ResolveException: unresolved dependency: > com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not found > at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:320) > at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191) > at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168) > at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156) > at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156) > at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133) > at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57) > at sbt.IvySbt$$anon$4.call(Ivy.scala:65) > at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93) > at > xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78) > at > xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97) > at xsbt.boot.Using$.withResource(Using.scala:10) > at xsbt.boot.Using$.apply(Using.scala:9) > at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58) > at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48) > at xsbt.boot.Locks$.apply0(Locks.scala:31) > at xsbt.boot.Locks$.apply(Locks.scala:28) > at sbt.IvySbt.withDefaultLogger(Ivy.scala:65) > at sbt.IvySbt.withIvy(Ivy.scala:128) > at sbt.IvySbt.withIvy(Ivy.scala:125) > at sbt.IvySbt$Module.withModule(Ivy.scala:156) > at sbt.IvyActions$.updateEither(IvyActions.scala:168) > at > sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1555) > at > sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1551) > at > sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1586) > at > sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1584) > at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:37) > at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1589) > at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1583) > at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:60) > at sbt.Classpaths$.cachedUpdate(Defaults.scala:1606) > at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1533) > at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1485) > at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) > at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40) > at sbt.std.Transform$$anon$4.work(System.scala:63) > at > sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) > at > sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) > at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17) > at sbt.Execute.work(Execute.scala:237) >
[jira] [Assigned] (SPARK-25041) genjavadoc-plugin_2.12.6 is not found with sbt in scala-2.12
[ https://issues.apache.org/jira/browse/SPARK-25041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25041: Assignee: (was: Apache Spark) > genjavadoc-plugin_2.12.6 is not found with sbt in scala-2.12 > > > Key: SPARK-25041 > URL: https://issues.apache.org/jira/browse/SPARK-25041 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Kazuaki Ishizaki >Priority: Major > > When the master is build with sbt in scala-2.12, the following error occurs: > {code} > [warn]module not found: > com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10 > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom > [warn] Maven2 Local: tried > [warn] > file:/gsa/jpngsa/home/i/s/ishizaki/.m2/repository/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom > [warn] local: tried > [warn] > /gsa/jpngsa/home/i/s/ishizaki/.ivy2/local/com.typesafe.genjavadoc/genjavadoc-plugin_2.12.6/0.10/ivys/ivy.xml > [info] Resolving jline#jline;2.14.3 ... > [warn]:: > [warn]:: UNRESOLVED DEPENDENCIES :: > [warn]:: > [warn]:: com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not > found > [warn]:: > [warn] > [warn]Note: Unresolved dependencies path: > [warn]com.typesafe.genjavadoc:genjavadoc-plugin_2.12.6:0.10 > (/home/ishizaki/Spark/PR/scala212/spark/project/SparkBuild.scala#L118) > [warn] +- org.apache.spark:spark-tags_2.12:2.4.0-SNAPSHOT > sbt.ResolveException: unresolved dependency: > com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not found > at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:320) > at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191) > at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168) > at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156) > at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156) > at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133) > at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57) > at sbt.IvySbt$$anon$4.call(Ivy.scala:65) > at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93) > at > xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78) > at > xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97) > at xsbt.boot.Using$.withResource(Using.scala:10) > at xsbt.boot.Using$.apply(Using.scala:9) > at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58) > at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48) > at xsbt.boot.Locks$.apply0(Locks.scala:31) > at xsbt.boot.Locks$.apply(Locks.scala:28) > at sbt.IvySbt.withDefaultLogger(Ivy.scala:65) > at sbt.IvySbt.withIvy(Ivy.scala:128) > at sbt.IvySbt.withIvy(Ivy.scala:125) > at sbt.IvySbt$Module.withModule(Ivy.scala:156) > at sbt.IvyActions$.updateEither(IvyActions.scala:168) > at > sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1555) > at > sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1551) > at > sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1586) > at > sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1584) > at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:37) > at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1589) > at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1583) > at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:60) > at sbt.Classpaths$.cachedUpdate(Defaults.scala:1606) > at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1533) > at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1485) > at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) > at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40) > at sbt.std.Transform$$anon$4.work(System.scala:63) > at > sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) > at > sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) > at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17) > at sbt.Execute.work(Execute.scala:237) > at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228) > at
[jira] [Assigned] (SPARK-25041) genjavadoc-plugin_2.12.6 is not found with sbt in scala-2.12
[ https://issues.apache.org/jira/browse/SPARK-25041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25041: Assignee: Apache Spark > genjavadoc-plugin_2.12.6 is not found with sbt in scala-2.12 > > > Key: SPARK-25041 > URL: https://issues.apache.org/jira/browse/SPARK-25041 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Kazuaki Ishizaki >Assignee: Apache Spark >Priority: Major > > When the master is build with sbt in scala-2.12, the following error occurs: > {code} > [warn]module not found: > com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10 > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom > [warn] Maven2 Local: tried > [warn] > file:/gsa/jpngsa/home/i/s/ishizaki/.m2/repository/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom > [warn] local: tried > [warn] > /gsa/jpngsa/home/i/s/ishizaki/.ivy2/local/com.typesafe.genjavadoc/genjavadoc-plugin_2.12.6/0.10/ivys/ivy.xml > [info] Resolving jline#jline;2.14.3 ... > [warn]:: > [warn]:: UNRESOLVED DEPENDENCIES :: > [warn]:: > [warn]:: com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not > found > [warn]:: > [warn] > [warn]Note: Unresolved dependencies path: > [warn]com.typesafe.genjavadoc:genjavadoc-plugin_2.12.6:0.10 > (/home/ishizaki/Spark/PR/scala212/spark/project/SparkBuild.scala#L118) > [warn] +- org.apache.spark:spark-tags_2.12:2.4.0-SNAPSHOT > sbt.ResolveException: unresolved dependency: > com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not found > at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:320) > at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191) > at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168) > at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156) > at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156) > at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133) > at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57) > at sbt.IvySbt$$anon$4.call(Ivy.scala:65) > at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93) > at > xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78) > at > xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97) > at xsbt.boot.Using$.withResource(Using.scala:10) > at xsbt.boot.Using$.apply(Using.scala:9) > at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58) > at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48) > at xsbt.boot.Locks$.apply0(Locks.scala:31) > at xsbt.boot.Locks$.apply(Locks.scala:28) > at sbt.IvySbt.withDefaultLogger(Ivy.scala:65) > at sbt.IvySbt.withIvy(Ivy.scala:128) > at sbt.IvySbt.withIvy(Ivy.scala:125) > at sbt.IvySbt$Module.withModule(Ivy.scala:156) > at sbt.IvyActions$.updateEither(IvyActions.scala:168) > at > sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1555) > at > sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1551) > at > sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1586) > at > sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1584) > at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:37) > at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1589) > at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1583) > at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:60) > at sbt.Classpaths$.cachedUpdate(Defaults.scala:1606) > at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1533) > at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1485) > at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) > at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40) > at sbt.std.Transform$$anon$4.work(System.scala:63) > at > sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) > at > sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) > at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17) > at sbt.Execute.work(Execute.scala:237) > at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228) > at
[jira] [Commented] (SPARK-25041) genjavadoc-plugin_2.12.6 is not found with sbt in scala-2.12
[ https://issues.apache.org/jira/browse/SPARK-25041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571240#comment-16571240 ] Apache Spark commented on SPARK-25041: -- User 'kiszk' has created a pull request for this issue: https://github.com/apache/spark/pull/22020 > genjavadoc-plugin_2.12.6 is not found with sbt in scala-2.12 > > > Key: SPARK-25041 > URL: https://issues.apache.org/jira/browse/SPARK-25041 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Kazuaki Ishizaki >Priority: Major > > When the master is build with sbt in scala-2.12, the following error occurs: > {code} > [warn]module not found: > com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10 > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom > [warn] Maven2 Local: tried > [warn] > file:/gsa/jpngsa/home/i/s/ishizaki/.m2/repository/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom > [warn] local: tried > [warn] > /gsa/jpngsa/home/i/s/ishizaki/.ivy2/local/com.typesafe.genjavadoc/genjavadoc-plugin_2.12.6/0.10/ivys/ivy.xml > [info] Resolving jline#jline;2.14.3 ... > [warn]:: > [warn]:: UNRESOLVED DEPENDENCIES :: > [warn]:: > [warn]:: com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not > found > [warn]:: > [warn] > [warn]Note: Unresolved dependencies path: > [warn]com.typesafe.genjavadoc:genjavadoc-plugin_2.12.6:0.10 > (/home/ishizaki/Spark/PR/scala212/spark/project/SparkBuild.scala#L118) > [warn] +- org.apache.spark:spark-tags_2.12:2.4.0-SNAPSHOT > sbt.ResolveException: unresolved dependency: > com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not found > at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:320) > at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191) > at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168) > at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156) > at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156) > at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133) > at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57) > at sbt.IvySbt$$anon$4.call(Ivy.scala:65) > at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93) > at > xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78) > at > xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97) > at xsbt.boot.Using$.withResource(Using.scala:10) > at xsbt.boot.Using$.apply(Using.scala:9) > at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58) > at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48) > at xsbt.boot.Locks$.apply0(Locks.scala:31) > at xsbt.boot.Locks$.apply(Locks.scala:28) > at sbt.IvySbt.withDefaultLogger(Ivy.scala:65) > at sbt.IvySbt.withIvy(Ivy.scala:128) > at sbt.IvySbt.withIvy(Ivy.scala:125) > at sbt.IvySbt$Module.withModule(Ivy.scala:156) > at sbt.IvyActions$.updateEither(IvyActions.scala:168) > at > sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1555) > at > sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1551) > at > sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1586) > at > sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1584) > at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:37) > at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1589) > at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1583) > at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:60) > at sbt.Classpaths$.cachedUpdate(Defaults.scala:1606) > at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1533) > at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1485) > at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) > at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40) > at sbt.std.Transform$$anon$4.work(System.scala:63) > at > sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) > at > sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) > at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17) > at sbt.Execute.work(Execute.scala:237) > at
[jira] [Resolved] (SPARK-24341) Codegen compile error from predicate subquery
[ https://issues.apache.org/jira/browse/SPARK-24341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-24341. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21403 [https://github.com/apache/spark/pull/21403] > Codegen compile error from predicate subquery > - > > Key: SPARK-24341 > URL: https://issues.apache.org/jira/browse/SPARK-24341 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Juliusz Sompolski >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.0 > > > Ran on master: > {code} > drop table if exists juleka; > drop table if exists julekb; > create table juleka (a integer, b integer); > create table julekb (na integer, nb integer); > insert into juleka values (1,1); > insert into julekb values (1,1); > select * from juleka where (a, b) not in (select (na, nb) from julekb); > {code} > Results in: > {code} > java.util.concurrent.ExecutionException: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 27, Column 29: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 27, Column 29: Cannot compare types "int" and > "org.apache.spark.sql.catalyst.InternalRow" > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at > com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) > at > com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2344) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2316) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278) > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193) > at com.google.common.cache.LocalCache.get(LocalCache.java:3932) > at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3936) > at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4806) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1415) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:92) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.generate(GeneratePredicate.scala:46) > at > org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:380) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$lzycompute(BroadcastNestedLoopJoinExec.scala:99) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition(BroadcastNestedLoopJoinExec.scala:97) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2$$anonfun$apply$3.apply(BroadcastNestedLoopJoinExec.scala:203) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2$$anonfun$apply$3.apply(BroadcastNestedLoopJoinExec.scala:203) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at > scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:46) > at scala.collection.mutable.ArrayOps$ofRef.exists(ArrayOps.scala:186) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2.apply(BroadcastNestedLoopJoinExec.scala:203) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2.apply(BroadcastNestedLoopJoinExec.scala:202) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:389) > at > org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:49) > at > org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:126) > at > org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:125) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:111) > at