[jira] [Updated] (SPARK-24948) SHS filters wrongly some applications due to permission check

2018-08-07 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24948:

Fix Version/s: 2.2.3

> SHS filters wrongly some applications due to permission check
> -
>
> Key: SPARK-24948
> URL: https://issues.apache.org/jira/browse/SPARK-24948
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: Marco Gaido
>Assignee: Marco Gaido
>Priority: Blocker
> Fix For: 2.2.3, 2.3.2, 2.4.0
>
>
> SHS filters the event logs it doesn't have permissions to read. 
> Unfortunately, this check is quite naive, as it takes into account only the 
> base permissions (ie. user, group, other permissions). For instance, if ACL 
> are enabled, they are ignored in this check; moreover, each filesystem may 
> have different policies (eg. they can consider spark as a superuser who can 
> access everything).
> This results in some applications not being displayed in the SHS, despite the 
> Spark user (or whatever user the SHS is started with) can actually read their 
> ent logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24948) SHS filters wrongly some applications due to permission check

2018-08-07 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-24948:
---

Assignee: Marco Gaido

> SHS filters wrongly some applications due to permission check
> -
>
> Key: SPARK-24948
> URL: https://issues.apache.org/jira/browse/SPARK-24948
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: Marco Gaido
>Assignee: Marco Gaido
>Priority: Blocker
> Fix For: 2.3.2, 2.4.0
>
>
> SHS filters the event logs it doesn't have permissions to read. 
> Unfortunately, this check is quite naive, as it takes into account only the 
> base permissions (ie. user, group, other permissions). For instance, if ACL 
> are enabled, they are ignored in this check; moreover, each filesystem may 
> have different policies (eg. they can consider spark as a superuser who can 
> access everything).
> This results in some applications not being displayed in the SHS, despite the 
> Spark user (or whatever user the SHS is started with) can actually read their 
> ent logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22634) Update Bouncy castle dependency

2018-08-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572675#comment-16572675
 ] 

Steve Loughran commented on SPARK-22634:


If nothing else is using it, correct. And nothing is using any of the bouncy 
castle APIs directly.

But: you need to be sure that nothing else is using it through the javax.crypto 
APIs, especially the stuff in org.apache.spark.network.crypto, or worse: some 
library which uses those APIs.

The NOTICE files certainly hint that it's being used somehow

bq. This product optionally depends on 'Bouncy Castle Crypto APIs' to generate 
a temporary self-signed X.509 certificate when the JVM does not provide the 
equivalent functionality. 

There's not enough history in the git logs to line that up with any code that 
pops up with a quick scan.

Safest to update to the later version, while cutting the jets3t dependency 
(which is provably not used, it being incompatible with the shipping bc lib). 
Most due diligence: cut out bouncy castle and see what breaks...

> Update Bouncy castle dependency
> ---
>
> Key: SPARK-22634
> URL: https://issues.apache.org/jira/browse/SPARK-22634
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core, SQL, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Lior Regev
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 2.3.0
>
>
> Spark's usage of jets3t library as well as Spark's own Flume and Kafka 
> streaming uses bouncy castle version 1.51
> This is an outdated version as the latest one is 1.58
> This, in turn renders packages such as 
> [spark-hadoopcryptoledger-ds|https://github.com/ZuInnoTe/spark-hadoopcryptoledger-ds]
>  unusable since these require 1.58 and spark's distributions come along with 
> 1.51
> My own attempt was to run on EMR, and since I automatically get all of 
> spark's dependecies (bouncy castle 1.51 being one of them) into the 
> classpath, using the library to parse blockchain data failed due to missing 
> functionality.
> I have also opened an 
> [issue|https://bitbucket.org/jmurty/jets3t/issues/242/bouncycastle-dependency]
>  with jets3t to update their dependecy as well, but along with that Spark 
> would have to update it's own or at least be packaged with a newer version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25054) Enable MetricsServlet sink for Executor

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25054:


Assignee: Apache Spark

> Enable MetricsServlet sink for Executor
> ---
>
> Key: SPARK-25054
> URL: https://issues.apache.org/jira/browse/SPARK-25054
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Lantao Jin
>Assignee: Apache Spark
>Priority: Minor
>
> The MetricsServlet sink is added by default as a sink in the master. But 
> there is no way to query the Executor metrics via Servlet. This ticket offers 
> a way to enable the MetricsServlet Sink in Executor side when 
> spark.executor.ui.enabled is set to true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25054) Enable MetricsServlet sink for Executor

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572662#comment-16572662
 ] 

Apache Spark commented on SPARK-25054:
--

User 'LantaoJin' has created a pull request for this issue:
https://github.com/apache/spark/pull/22034

> Enable MetricsServlet sink for Executor
> ---
>
> Key: SPARK-25054
> URL: https://issues.apache.org/jira/browse/SPARK-25054
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Lantao Jin
>Priority: Minor
>
> The MetricsServlet sink is added by default as a sink in the master. But 
> there is no way to query the Executor metrics via Servlet. This ticket offers 
> a way to enable the MetricsServlet Sink in Executor side when 
> spark.executor.ui.enabled is set to true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25054) Enable MetricsServlet sink for Executor

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25054:


Assignee: (was: Apache Spark)

> Enable MetricsServlet sink for Executor
> ---
>
> Key: SPARK-25054
> URL: https://issues.apache.org/jira/browse/SPARK-25054
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Lantao Jin
>Priority: Minor
>
> The MetricsServlet sink is added by default as a sink in the master. But 
> there is no way to query the Executor metrics via Servlet. This ticket offers 
> a way to enable the MetricsServlet Sink in Executor side when 
> spark.executor.ui.enabled is set to true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25054) Enable MetricsServlet sink for Executor

2018-08-07 Thread Lantao Jin (JIRA)
Lantao Jin created SPARK-25054:
--

 Summary: Enable MetricsServlet sink for Executor
 Key: SPARK-25054
 URL: https://issues.apache.org/jira/browse/SPARK-25054
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.3.1
Reporter: Lantao Jin


The MetricsServlet sink is added by default as a sink in the master. But there 
is no way to query the Executor metrics via Servlet. This ticket offers a way 
to enable the MetricsServlet Sink in Executor side when 
spark.executor.ui.enabled is set to true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25052) Is there any possibility that spark structured streaming generate duplicates in the output?

2018-08-07 Thread bharath kumar avusherla (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572608#comment-16572608
 ] 

bharath kumar avusherla commented on SPARK-25052:
-

i also thought about it. Hence I created it as question. Anyhow i will send the 
question to the mailing list.

> Is there any possibility that spark structured streaming generate duplicates 
> in the output?
> ---
>
> Key: SPARK-25052
> URL: https://issues.apache.org/jira/browse/SPARK-25052
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: bharath kumar avusherla
>Priority: Minor
>
> We recently observed that the spark structured streaming generated duplicates 
> in the output when reading from Kafka topic and storing the output to the S3 
> (and checkpointing in S3).  We ran into this issue twice. This is not 
> reproducible. Is there anyone has ever faced this kind of issue before? Is 
> this because of S3 eventual consistency?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24948) SHS filters wrongly some applications due to permission check

2018-08-07 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24948:

Fix Version/s: 2.3.2

> SHS filters wrongly some applications due to permission check
> -
>
> Key: SPARK-24948
> URL: https://issues.apache.org/jira/browse/SPARK-24948
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: Marco Gaido
>Priority: Blocker
> Fix For: 2.3.2, 2.4.0
>
>
> SHS filters the event logs it doesn't have permissions to read. 
> Unfortunately, this check is quite naive, as it takes into account only the 
> base permissions (ie. user, group, other permissions). For instance, if ACL 
> are enabled, they are ignored in this check; moreover, each filesystem may 
> have different policies (eg. they can consider spark as a superuser who can 
> access everything).
> This results in some applications not being displayed in the SHS, despite the 
> Spark user (or whatever user the SHS is started with) can actually read their 
> ent logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25052) Is there any possibility that spark structured streaming generate duplicates in the output?

2018-08-07 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-25052.
--
Resolution: Invalid

Questions should better go to mailing list, 
https://spark.apache.org/community.html. Let's better file an issue when it's 
clear if this is an issue.

> Is there any possibility that spark structured streaming generate duplicates 
> in the output?
> ---
>
> Key: SPARK-25052
> URL: https://issues.apache.org/jira/browse/SPARK-25052
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: bharath kumar avusherla
>Priority: Minor
>
> We recently observed that the spark structured streaming generated duplicates 
> in the output when reading from Kafka topic and storing the output to the S3 
> (and checkpointing in S3).  We ran into this issue twice. This is not 
> reproducible. Is there anyone has ever faced this kind of issue before? Is 
> this because of S3 eventual consistency?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25051) where clause on dataset gives AnalysisException

2018-08-07 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572571#comment-16572571
 ] 

Hyukjin Kwon commented on SPARK-25051:
--

Can you post some codes for df1 and df2 as well?

> where clause on dataset gives AnalysisException
> ---
>
> Key: SPARK-25051
> URL: https://issues.apache.org/jira/browse/SPARK-25051
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: MIK
>Priority: Major
>
> *schemas :*
> df1
> => id ts
> df2
> => id name country
> *code:*
> val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull)
> *error*:
> org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing 
> from xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in 
> operator !Filter isnull(id#0). Attribute(s) with the same name appear in the 
> operation: id. Please check if the right attribute(s) are used.;;
>  at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104)
>     at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
>     at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
>     at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
>     at org.apache.spark.sql.Dataset.(Dataset.scala:172)
>     at org.apache.spark.sql.Dataset.(Dataset.scala:178)
>     at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65)
>     at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300)
>     at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458)
>     at org.apache.spark.sql.Dataset.where(Dataset.scala:1486)
> This works fine in spark 2.2.2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25029) Scala 2.12 issues: TaskNotSerializable and Janino "Two non-abstract methods ..." errors

2018-08-07 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572570#comment-16572570
 ] 

Sean Owen commented on SPARK-25029:
---

If we _really_ needed to resolve this unilaterally from the Spark side, I think 
we could get away forking one class from janino and patching it lightly per my 
pull request. Forking isn't great, especially when it's not clear whether 
future official releases will have something similar. But it's feasible here as 
I believe the patch works at least w.r.t. Spark.

> Scala 2.12 issues: TaskNotSerializable and Janino "Two non-abstract methods 
> ..." errors
> ---
>
> Key: SPARK-25029
> URL: https://issues.apache.org/jira/browse/SPARK-25029
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Priority: Blocker
>
> We actually still have some test failures in the Scala 2.12 build. There seem 
> to be two types. First are that some tests fail with "TaskNotSerializable" 
> because some code construct now captures a reference to scalatest's 
> AssertionHelper. Example:
> {code:java}
> - LegacyAccumulatorWrapper with AccumulatorParam that has no equals/hashCode 
> *** FAILED *** java.io.NotSerializableException: 
> org.scalatest.Assertions$AssertionsHelper Serialization stack: - object not 
> serializable (class: org.scalatest.Assertions$AssertionsHelper, value: 
> org.scalatest.Assertions$AssertionsHelper@3bc5fc8f){code}
> These seem generally easy to fix by tweaking the test code. It's not clear if 
> something about closure cleaning in 2.12 could be improved to detect this 
> situation automatically; given that yet only a handful of tests fail for this 
> reason, it's unlikely to be a systemic problem.
>  
> The other error is curioser. Janino fails to compile generate code in many 
> cases with errors like:
> {code:java}
> - encode/decode for seq of string: List(abc, xyz) *** FAILED ***
> java.lang.RuntimeException: Error while encoding: 
> org.codehaus.janino.InternalCompilerException: failed to compile: 
> org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass": 
> Two non-abstract methods "public int scala.collection.TraversableOnce.size()" 
> have the same parameter types, declaring type and return type{code}
>  
> I include the full generated code that failed in one case below. There is no 
> {{size()}} in the generated code. It's got to be down to some difference in 
> Scala 2.12, potentially even a Janino problem.
>  
> {code:java}
> Caused by: org.codehaus.janino.InternalCompilerException: Compiling 
> "GeneratedClass": Two non-abstract methods "public int 
> scala.collection.TraversableOnce.size()" have the same parameter types, 
> declaring type and return type
> at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:361)
> at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:234)
> at 
> org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:446)
> at 
> org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:313)
> at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:235)
> at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:204)
> at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1342)
> ... 30 more
> Caused by: org.codehaus.janino.InternalCompilerException: Two non-abstract 
> methods "public int scala.collection.TraversableOnce.size()" have the same 
> parameter types, declaring type and return type
> at 
> org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:9112)
> at 
> org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:)
> at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:8770)
> at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:8672)
> at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4737)
> at org.codehaus.janino.UnitCompiler.access$8300(UnitCompiler.java:212)
> at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:4097)
> at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:4070)
> at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4902)
> at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:4070)
> at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:5253)
> at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4391)
> at org.codehaus.janino.UnitCompiler.access$8000(UnitCompiler.java:212)
> at 
> 

[jira] [Commented] (SPARK-24924) Add mapping for built-in Avro data source

2018-08-07 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572564#comment-16572564
 ] 

Hyukjin Kwon commented on SPARK-24924:
--

[~cloud_fan], Yea, adding them as implicit sounds not a good idea. But I think 
we can still add {{spark.read.avro}} in {{DataFrameReader}} although it looks a 
bit weird since Avro is external package. 

> Add mapping for built-in Avro data source
> -
>
> Key: SPARK-24924
> URL: https://issues.apache.org/jira/browse/SPARK-24924
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.4.0
>
>
> This issue aims to the followings.
>  # Like `com.databricks.spark.csv` mapping, we had better map 
> `com.databricks.spark.avro` to built-in Avro data source.
>  # Remove incorrect error message, `Please find an Avro package at ...`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24251) DataSourceV2: Add AppendData logical operation

2018-08-07 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-24251.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21305
[https://github.com/apache/spark/pull/21305]

> DataSourceV2: Add AppendData logical operation
> --
>
> Key: SPARK-24251
> URL: https://issues.apache.org/jira/browse/SPARK-24251
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 2.4.0
>
>
> The SPIP to standardize SQL logical plans (SPARK-23521) proposes AppendData 
> for inserting data in append mode. This is the simplest plan to implement 
> first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24251) DataSourceV2: Add AppendData logical operation

2018-08-07 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-24251:
---

Assignee: Ryan Blue

> DataSourceV2: Add AppendData logical operation
> --
>
> Key: SPARK-24251
> URL: https://issues.apache.org/jira/browse/SPARK-24251
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 2.4.0
>
>
> The SPIP to standardize SQL logical plans (SPARK-23521) proposes AppendData 
> for inserting data in append mode. This is the simplest plan to implement 
> first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24924) Add mapping for built-in Avro data source

2018-08-07 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572556#comment-16572556
 ] 

Wenchen Fan commented on SPARK-24924:
-

>  I assume we could theoretically also support the spark.read.avro format as 
> well

There was a discussion about why we shouldn't support it: 
https://github.com/apache/spark/pull/21841

Users always need to do some manual work to use `spark.read.avro`, even with 
the databricks avro package. Now users can still define an implicit class to 
support `spark.read.avro` if they want to.

> Add mapping for built-in Avro data source
> -
>
> Key: SPARK-24924
> URL: https://issues.apache.org/jira/browse/SPARK-24924
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.4.0
>
>
> This issue aims to the followings.
>  # Like `com.databricks.spark.csv` mapping, we had better map 
> `com.databricks.spark.avro` to built-in Avro data source.
>  # Remove incorrect error message, `Please find an Avro package at ...`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22634) Update Bouncy castle dependency

2018-08-07 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572551#comment-16572551
 ] 

Sean Owen commented on SPARK-22634:
---

[~ste...@apache.org] are you saying that this whole issue is moot if 
SPARK-23654 is resolved? that might be the better resolution. If that's 
correct, then maybe bouncy castle isn't really used here?

> Update Bouncy castle dependency
> ---
>
> Key: SPARK-22634
> URL: https://issues.apache.org/jira/browse/SPARK-22634
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core, SQL, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Lior Regev
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 2.3.0
>
>
> Spark's usage of jets3t library as well as Spark's own Flume and Kafka 
> streaming uses bouncy castle version 1.51
> This is an outdated version as the latest one is 1.58
> This, in turn renders packages such as 
> [spark-hadoopcryptoledger-ds|https://github.com/ZuInnoTe/spark-hadoopcryptoledger-ds]
>  unusable since these require 1.58 and spark's distributions come along with 
> 1.51
> My own attempt was to run on EMR, and since I automatically get all of 
> spark's dependecies (bouncy castle 1.51 being one of them) into the 
> classpath, using the library to parse blockchain data failed due to missing 
> functionality.
> I have also opened an 
> [issue|https://bitbucket.org/jmurty/jets3t/issues/242/bouncycastle-dependency]
>  with jets3t to update their dependecy as well, but along with that Spark 
> would have to update it's own or at least be packaged with a newer version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22634) Update Bouncy castle dependency

2018-08-07 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572542#comment-16572542
 ] 

Saisai Shao commented on SPARK-22634:
-

[~srowen] I'm wondering if it is possible to upgrade to version 1.6.0, as this 
version fixed to CVEs (https://www.bouncycastle.org/latest_releases.html).

> Update Bouncy castle dependency
> ---
>
> Key: SPARK-22634
> URL: https://issues.apache.org/jira/browse/SPARK-22634
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core, SQL, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Lior Regev
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 2.3.0
>
>
> Spark's usage of jets3t library as well as Spark's own Flume and Kafka 
> streaming uses bouncy castle version 1.51
> This is an outdated version as the latest one is 1.58
> This, in turn renders packages such as 
> [spark-hadoopcryptoledger-ds|https://github.com/ZuInnoTe/spark-hadoopcryptoledger-ds]
>  unusable since these require 1.58 and spark's distributions come along with 
> 1.51
> My own attempt was to run on EMR, and since I automatically get all of 
> spark's dependecies (bouncy castle 1.51 being one of them) into the 
> classpath, using the library to parse blockchain data failed due to missing 
> functionality.
> I have also opened an 
> [issue|https://bitbucket.org/jmurty/jets3t/issues/242/bouncycastle-dependency]
>  with jets3t to update their dependecy as well, but along with that Spark 
> would have to update it's own or at least be packaged with a newer version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23935) High-order function: map_entries(map) → array>

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572532#comment-16572532
 ] 

Apache Spark commented on SPARK-23935:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/22033

> High-order function: map_entries(map) → array>
> -
>
> Key: SPARK-23935
> URL: https://issues.apache.org/jira/browse/SPARK-23935
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Marek Novotny
>Priority: Major
> Fix For: 2.4.0
>
>
> Ref: https://prestodb.io/docs/current/functions/map.html
> Returns an array of all entries in the given map.
> {noformat}
> SELECT map_entries(MAP(ARRAY[1, 2], ARRAY['x', 'y'])); -- [ROW(1, 'x'), 
> ROW(2, 'y')]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25047:


Assignee: Apache Spark

> Can't assign SerializedLambda to scala.Function1 in deserialization of 
> BucketedRandomProjectionLSHModel
> ---
>
> Key: SPARK-25047
> URL: https://issues.apache.org/jira/browse/SPARK-25047
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Assignee: Apache Spark
>Priority: Major
>
> Another distinct test failure:
> {code:java}
> - BucketedRandomProjectionLSH: streaming transform *** FAILED ***
>   org.apache.spark.sql.streaming.StreamingQueryException: Query [id = 
> 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = 
> 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted 
> due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent 
> failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): 
> java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of 
> type scala.Function1 in instance of 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
> ...
>   Cause: java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of 
> type scala.Function1 in instance of 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
>   at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>   at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284)
> ...{code}
> Here the different nature of a Java 8 LMF closure trips of Java 
> serialization/deserialization. I think this can be patched by manually 
> implementing the Java serialization here, and don't see other instances (yet).
> Also wondering if this "val" can be a "def".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25047:


Assignee: (was: Apache Spark)

> Can't assign SerializedLambda to scala.Function1 in deserialization of 
> BucketedRandomProjectionLSHModel
> ---
>
> Key: SPARK-25047
> URL: https://issues.apache.org/jira/browse/SPARK-25047
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Priority: Major
>
> Another distinct test failure:
> {code:java}
> - BucketedRandomProjectionLSH: streaming transform *** FAILED ***
>   org.apache.spark.sql.streaming.StreamingQueryException: Query [id = 
> 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = 
> 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted 
> due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent 
> failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): 
> java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of 
> type scala.Function1 in instance of 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
> ...
>   Cause: java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of 
> type scala.Function1 in instance of 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
>   at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>   at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284)
> ...{code}
> Here the different nature of a Java 8 LMF closure trips of Java 
> serialization/deserialization. I think this can be patched by manually 
> implementing the Java serialization here, and don't see other instances (yet).
> Also wondering if this "val" can be a "def".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572518#comment-16572518
 ] 

Apache Spark commented on SPARK-25047:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/22032

> Can't assign SerializedLambda to scala.Function1 in deserialization of 
> BucketedRandomProjectionLSHModel
> ---
>
> Key: SPARK-25047
> URL: https://issues.apache.org/jira/browse/SPARK-25047
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Priority: Major
>
> Another distinct test failure:
> {code:java}
> - BucketedRandomProjectionLSH: streaming transform *** FAILED ***
>   org.apache.spark.sql.streaming.StreamingQueryException: Query [id = 
> 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = 
> 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted 
> due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent 
> failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): 
> java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of 
> type scala.Function1 in instance of 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
> ...
>   Cause: java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of 
> type scala.Function1 in instance of 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
>   at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>   at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284)
> ...{code}
> Here the different nature of a Java 8 LMF closure trips of Java 
> serialization/deserialization. I think this can be patched by manually 
> implementing the Java serialization here, and don't see other instances (yet).
> Also wondering if this "val" can be a "def".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25045) Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`

2018-08-07 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-25045.
---
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 22026
[https://github.com/apache/spark/pull/22026]

> Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`
> ---
>
> Key: SPARK-25045
> URL: https://issues.apache.org/jira/browse/SPARK-25045
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Jiang Xingbo
>Assignee: Jiang Xingbo
>Priority: Major
> Fix For: 2.4.0
>
>
> Signature of the function passed to `RDDBarrier.mapPartitions()` is different 
> from that of `RDD.mapPartitions`. The latter doesn’t take a TaskContext. We 
> shall make the function signature the same to avoid confusion and misusage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25045) Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`

2018-08-07 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng reassigned SPARK-25045:
-

Assignee: Jiang Xingbo

> Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`
> ---
>
> Key: SPARK-25045
> URL: https://issues.apache.org/jira/browse/SPARK-25045
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Jiang Xingbo
>Assignee: Jiang Xingbo
>Priority: Major
> Fix For: 2.4.0
>
>
> Signature of the function passed to `RDDBarrier.mapPartitions()` is different 
> from that of `RDD.mapPartitions`. The latter doesn’t take a TaskContext. We 
> shall make the function signature the same to avoid confusion and misusage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25053) Allow additional port forwarding on Spark on K8S as needed

2018-08-07 Thread holdenk (JIRA)
holdenk created SPARK-25053:
---

 Summary: Allow additional port forwarding on Spark on K8S as needed
 Key: SPARK-25053
 URL: https://issues.apache.org/jira/browse/SPARK-25053
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: holdenk


In some cases, like setting up remote debuggers, adding additional ports to be 
forwarded would be useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25052) Is there any possibility that spark structured streaming generate duplicates in the output?

2018-08-07 Thread bharath kumar avusherla (JIRA)
bharath kumar avusherla created SPARK-25052:
---

 Summary: Is there any possibility that spark structured streaming 
generate duplicates in the output?
 Key: SPARK-25052
 URL: https://issues.apache.org/jira/browse/SPARK-25052
 Project: Spark
  Issue Type: Question
  Components: Spark Core
Affects Versions: 2.3.0
Reporter: bharath kumar avusherla


We recently observed that the spark structured streaming generated duplicates 
in the output when reading from Kafka topic and storing the output to the S3 
(and checkpointing in S3).  We ran into this issue twice. This is not 
reproducible. Is there anyone has ever faced this kind of issue before? Is this 
because of S3 eventual consistency?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel

2018-08-07 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572410#comment-16572410
 ] 

Sean Owen commented on SPARK-25047:
---

More notes.These two SO answers shed a little light:

[https://stackoverflow.com/a/28367602/64174]

[https://stackoverflow.com/questions/28079307/unable-to-deserialize-lambda/28084460#28084460]

It suggests the problem is that the SerializedLambda instance that is 
deserialized should provide a readResolve() method to, I assume, resolve it 
back into a scala.Function1. And that should actually be implemented by a 
{{$deserializeLambda$(SerializedLambda)}} function in the capturing class. It 
seems like something isn't turning it back from a SerializedLambda to something 
else.

The method is in the byte code of BucketedRandomProjectionLSH and decompiles as
{code:java}
private static /* synthetic */ Object $deserializeLambda$(SerializedLambda 
serializedLambda) {
    return LambdaDeserialize.bootstrap(new 
MethodHandle[]{$anonfun$hashDistance$1$adapted(scala.Tuple2 ), 
$anonfun$hashFunction$2$adapted(org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
 org.apache.spark.ml.linalg.Vector org.apache.spark.ml.linalg.Vector ), 
$anonfun$hashFunction$3$adapted(java.lang.Object ), 
$anonfun$hashFunction$1(org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
 org.apache.spark.ml.linalg.Vector )}, serializedLambda);

}{code}
While I traced through this for a while, I couldn't make sense of it. However, 
nothing actually failed around here. The ultimate error was a bit later, and as 
in the StackOverflow post above.

It goes without saying that there are plenty of fields of type scala.Function1 
in Spark and this is the only problem one, and I can't see why. Is it because 
it involves an array type? grepping suggests that could be unique. However I 
tried to create a repro in a simple class file and all worked as expected too.

Something is odd about this case, and I don't know if it is in fact triggering 
some odd corner case issue in scala or Java 8, or whether the Spark code could 
be tweaked to dodge it.

 

> Can't assign SerializedLambda to scala.Function1 in deserialization of 
> BucketedRandomProjectionLSHModel
> ---
>
> Key: SPARK-25047
> URL: https://issues.apache.org/jira/browse/SPARK-25047
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Priority: Major
>
> Another distinct test failure:
> {code:java}
> - BucketedRandomProjectionLSH: streaming transform *** FAILED ***
>   org.apache.spark.sql.streaming.StreamingQueryException: Query [id = 
> 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = 
> 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted 
> due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent 
> failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): 
> java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of 
> type scala.Function1 in instance of 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
> ...
>   Cause: java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of 
> type scala.Function1 in instance of 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
>   at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>   at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284)
> ...{code}
> Here the different nature of a Java 8 LMF closure trips of Java 
> serialization/deserialization. I think this can be patched by manually 
> implementing the Java serialization here, and don't see other instances (yet).
> Also wondering if this "val" can be a "def".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25046) Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO"

2018-08-07 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-25046:
---

Assignee: SongXun

> Alter View  can excute sql  like "ALTER VIEW ... AS INSERT INTO" 
> -
>
> Key: SPARK-25046
> URL: https://issues.apache.org/jira/browse/SPARK-25046
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: SongXun
>Assignee: SongXun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.4.0
>
>
> Alter View  can excute sql  like "ALTER VIEW ... AS INSERT INTO" . We should 
> throw 
> ParseException(s"Operation not allowed: $message", ctx)  as Create View does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25046) Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO"

2018-08-07 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-25046.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

> Alter View  can excute sql  like "ALTER VIEW ... AS INSERT INTO" 
> -
>
> Key: SPARK-25046
> URL: https://issues.apache.org/jira/browse/SPARK-25046
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: SongXun
>Assignee: SongXun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.4.0
>
>
> Alter View  can excute sql  like "ALTER VIEW ... AS INSERT INTO" . We should 
> throw 
> ParseException(s"Operation not allowed: $message", ctx)  as Create View does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers

2018-08-07 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572232#comment-16572232
 ] 

Thomas Graves commented on SPARK-23207:
---

does this affect spark 2.2 and < ? from the description it sounds like it, in 
which case we should backport.

> Shuffle+Repartition on an DataFrame could lead to incorrect answers
> ---
>
> Key: SPARK-23207
> URL: https://issues.apache.org/jira/browse/SPARK-23207
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jiang Xingbo
>Assignee: Jiang Xingbo
>Priority: Blocker
>  Labels: correctness
> Fix For: 2.3.0
>
>
> Currently shuffle repartition uses RoundRobinPartitioning, the generated 
> result is nondeterministic since the sequence of input rows are not 
> determined.
> The bug can be triggered when there is a repartition call following a shuffle 
> (which would lead to non-deterministic row ordering), as the pattern shows 
> below:
> upstream stage -> repartition stage -> result stage
> (-> indicate a shuffle)
> When one of the executors process goes down, some tasks on the repartition 
> stage will be retried and generate inconsistent ordering, and some tasks of 
> the result stage will be retried generating different data.
> The following code returns 931532, instead of 100:
> {code}
> import scala.sys.process._
> import org.apache.spark.TaskContext
> val res = spark.range(0, 1000 * 1000, 1).repartition(200).map { x =>
>   x
> }.repartition(200).map { x =>
>   if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 2) {
> throw new Exception("pkill -f java".!!)
>   }
>   x
> }
> res.distinct().count()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25029) Scala 2.12 issues: TaskNotSerializable and Janino "Two non-abstract methods ..." errors

2018-08-07 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-25029:
--
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-14220

> Scala 2.12 issues: TaskNotSerializable and Janino "Two non-abstract methods 
> ..." errors
> ---
>
> Key: SPARK-25029
> URL: https://issues.apache.org/jira/browse/SPARK-25029
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Priority: Blocker
>
> We actually still have some test failures in the Scala 2.12 build. There seem 
> to be two types. First are that some tests fail with "TaskNotSerializable" 
> because some code construct now captures a reference to scalatest's 
> AssertionHelper. Example:
> {code:java}
> - LegacyAccumulatorWrapper with AccumulatorParam that has no equals/hashCode 
> *** FAILED *** java.io.NotSerializableException: 
> org.scalatest.Assertions$AssertionsHelper Serialization stack: - object not 
> serializable (class: org.scalatest.Assertions$AssertionsHelper, value: 
> org.scalatest.Assertions$AssertionsHelper@3bc5fc8f){code}
> These seem generally easy to fix by tweaking the test code. It's not clear if 
> something about closure cleaning in 2.12 could be improved to detect this 
> situation automatically; given that yet only a handful of tests fail for this 
> reason, it's unlikely to be a systemic problem.
>  
> The other error is curioser. Janino fails to compile generate code in many 
> cases with errors like:
> {code:java}
> - encode/decode for seq of string: List(abc, xyz) *** FAILED ***
> java.lang.RuntimeException: Error while encoding: 
> org.codehaus.janino.InternalCompilerException: failed to compile: 
> org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass": 
> Two non-abstract methods "public int scala.collection.TraversableOnce.size()" 
> have the same parameter types, declaring type and return type{code}
>  
> I include the full generated code that failed in one case below. There is no 
> {{size()}} in the generated code. It's got to be down to some difference in 
> Scala 2.12, potentially even a Janino problem.
>  
> {code:java}
> Caused by: org.codehaus.janino.InternalCompilerException: Compiling 
> "GeneratedClass": Two non-abstract methods "public int 
> scala.collection.TraversableOnce.size()" have the same parameter types, 
> declaring type and return type
> at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:361)
> at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:234)
> at 
> org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:446)
> at 
> org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:313)
> at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:235)
> at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:204)
> at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1342)
> ... 30 more
> Caused by: org.codehaus.janino.InternalCompilerException: Two non-abstract 
> methods "public int scala.collection.TraversableOnce.size()" have the same 
> parameter types, declaring type and return type
> at 
> org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:9112)
> at 
> org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:)
> at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:8770)
> at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:8672)
> at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4737)
> at org.codehaus.janino.UnitCompiler.access$8300(UnitCompiler.java:212)
> at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:4097)
> at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:4070)
> at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4902)
> at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:4070)
> at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:5253)
> at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4391)
> at org.codehaus.janino.UnitCompiler.access$8000(UnitCompiler.java:212)
> at 
> org.codehaus.janino.UnitCompiler$12.visitConditionalExpression(UnitCompiler.java:4094)
> at 
> org.codehaus.janino.UnitCompiler$12.visitConditionalExpression(UnitCompiler.java:4070)
> at org.codehaus.janino.Java$ConditionalExpression.accept(Java.java:4344)
> at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:4070)
> at 

[jira] [Updated] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12

2018-08-07 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-25044:
--
Description: 
A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 Fix 
HandleNullInputsForUDF rule":
{code:java}
- SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED ***
Results do not match for query:
...
== Results ==

== Results ==
!== Correct Answer - 3 == == Spark Answer - 3 ==
!struct<> struct
![0,10,null] [0,10,0]
![1,12,null] [1,12,1]
![2,14,null] [2,14,2] (QueryTest.scala:163){code}
You can kind of get what's going on reading the test:
{code:java}
test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
// assume(!ClosureCleanerSuite2.supportsLMFs)
// This test won't test what it intends to in 2.12, as lambda metafactory 
closures
// have arg types that are not primitive, but Object
val udf1 = udf({(x: Int, y: Int) => x + y})
val df = spark.range(0, 3).toDF("a")
.withColumn("b", udf1($"a", udf1($"a", lit(10
.withColumn("c", udf1($"a", lit(null)))
val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed

comparePlans(df.logicalPlan, plan)
checkAnswer(
df,
Seq(
Row(0, 10, null),
Row(1, 12, null),
Row(2, 14, null)))
}{code}
 

It seems that the closure that is fed in as a UDF changes behavior, in a way 
that primitive-type arguments are handled differently. For example an Int 
argument, when fed 'null', acts like 0.

I'm sure it's a difference in the LMF closure and how its types are understood, 
but not exactly sure of the cause yet.

  was:
A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 Fix 
HandleNullInputsForUDF rule". (Details in a sec when I can copy-paste them.)

It seems that the closure that is fed in as a UDF changes behavior, in a way 
that primitive-type arguments are handled differently. For example an Int 
argument, when fed 'null', acts like 0.

I'm sure it's a difference in the LMF closure and how its types are understood, 
but not exactly sure of the cause yet.


> Address translation of LMF closure primitive args to Object in Scala 2.12
> -
>
> Key: SPARK-25044
> URL: https://issues.apache.org/jira/browse/SPARK-25044
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Priority: Major
>
> A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 
> Fix HandleNullInputsForUDF rule":
> {code:java}
> - SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED ***
> Results do not match for query:
> ...
> == Results ==
> == Results ==
> !== Correct Answer - 3 == == Spark Answer - 3 ==
> !struct<> struct
> ![0,10,null] [0,10,0]
> ![1,12,null] [1,12,1]
> ![2,14,null] [2,14,2] (QueryTest.scala:163){code}
> You can kind of get what's going on reading the test:
> {code:java}
> test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
> // assume(!ClosureCleanerSuite2.supportsLMFs)
> // This test won't test what it intends to in 2.12, as lambda metafactory 
> closures
> // have arg types that are not primitive, but Object
> val udf1 = udf({(x: Int, y: Int) => x + y})
> val df = spark.range(0, 3).toDF("a")
> .withColumn("b", udf1($"a", udf1($"a", lit(10
> .withColumn("c", udf1($"a", lit(null)))
> val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed
> comparePlans(df.logicalPlan, plan)
> checkAnswer(
> df,
> Seq(
> Row(0, 10, null),
> Row(1, 12, null),
> Row(2, 14, null)))
> }{code}
>  
> It seems that the closure that is fed in as a UDF changes behavior, in a way 
> that primitive-type arguments are handled differently. For example an Int 
> argument, when fed 'null', acts like 0.
> I'm sure it's a difference in the LMF closure and how its types are 
> understood, but not exactly sure of the cause yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25051) where clause on dataset gives AnalysisException

2018-08-07 Thread MIK (JIRA)
MIK created SPARK-25051:
---

 Summary: where clause on dataset gives AnalysisException
 Key: SPARK-25051
 URL: https://issues.apache.org/jira/browse/SPARK-25051
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 2.3.0
Reporter: MIK


*schemas :*
df1
=> id ts
df2
=> id name country

*code:*

val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull)

*error*:

org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing from 
xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in 
operator !Filter isnull(id#0). Attribute(s) with the same name appear in the 
operation: id. Please check if the right attribute(s) are used.;;

 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
    at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)
    at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289)
    at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
    at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80)
    at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
    at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104)
    at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
    at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
    at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
    at org.apache.spark.sql.Dataset.(Dataset.scala:172)
    at org.apache.spark.sql.Dataset.(Dataset.scala:178)
    at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65)
    at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300)
    at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458)
    at org.apache.spark.sql.Dataset.where(Dataset.scala:1486)

This works fine in spark 2.2.2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23932) High-order function: zip_with(array, array, function) → array

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23932:


Assignee: (was: Apache Spark)

> High-order function: zip_with(array, array, function) → 
> array
> ---
>
> Key: SPARK-23932
> URL: https://issues.apache.org/jira/browse/SPARK-23932
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Merges the two given arrays, element-wise, into a single array using 
> function. Both arrays must be the same length.
> {noformat}
> SELECT zip_with(ARRAY[1, 3, 5], ARRAY['a', 'b', 'c'], (x, y) -> (y, x)); -- 
> [ROW('a', 1), ROW('b', 3), ROW('c', 5)]
> SELECT zip_with(ARRAY[1, 2], ARRAY[3, 4], (x, y) -> x + y); -- [4, 6]
> SELECT zip_with(ARRAY['a', 'b', 'c'], ARRAY['d', 'e', 'f'], (x, y) -> 
> concat(x, y)); -- ['ad', 'be', 'cf']
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23932) High-order function: zip_with(array, array, function) → array

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23932:


Assignee: Apache Spark

> High-order function: zip_with(array, array, function) → 
> array
> ---
>
> Key: SPARK-23932
> URL: https://issues.apache.org/jira/browse/SPARK-23932
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Merges the two given arrays, element-wise, into a single array using 
> function. Both arrays must be the same length.
> {noformat}
> SELECT zip_with(ARRAY[1, 3, 5], ARRAY['a', 'b', 'c'], (x, y) -> (y, x)); -- 
> [ROW('a', 1), ROW('b', 3), ROW('c', 5)]
> SELECT zip_with(ARRAY[1, 2], ARRAY[3, 4], (x, y) -> x + y); -- [4, 6]
> SELECT zip_with(ARRAY['a', 'b', 'c'], ARRAY['d', 'e', 'f'], (x, y) -> 
> concat(x, y)); -- ['ad', 'be', 'cf']
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23932) High-order function: zip_with(array, array, function) → array

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572140#comment-16572140
 ] 

Apache Spark commented on SPARK-23932:
--

User 'techaddict' has created a pull request for this issue:
https://github.com/apache/spark/pull/22031

> High-order function: zip_with(array, array, function) → 
> array
> ---
>
> Key: SPARK-23932
> URL: https://issues.apache.org/jira/browse/SPARK-23932
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Merges the two given arrays, element-wise, into a single array using 
> function. Both arrays must be the same length.
> {noformat}
> SELECT zip_with(ARRAY[1, 3, 5], ARRAY['a', 'b', 'c'], (x, y) -> (y, x)); -- 
> [ROW('a', 1), ROW('b', 3), ROW('c', 5)]
> SELECT zip_with(ARRAY[1, 2], ARRAY[3, 4], (x, y) -> x + y); -- [4, 6]
> SELECT zip_with(ARRAY['a', 'b', 'c'], ARRAY['d', 'e', 'f'], (x, y) -> 
> concat(x, y)); -- ['ad', 'be', 'cf']
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12

2018-08-07 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572095#comment-16572095
 ] 

Stavros Kontopoulos edited comment on SPARK-25044 at 8/7/18 6:35 PM:
-

[~lrytz] any insight?


was (Author: skonto):
@lrytz any insight?

> Address translation of LMF closure primitive args to Object in Scala 2.12
> -
>
> Key: SPARK-25044
> URL: https://issues.apache.org/jira/browse/SPARK-25044
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Priority: Major
>
> A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 
> Fix HandleNullInputsForUDF rule". (Details in a sec when I can copy-paste 
> them.)
> It seems that the closure that is fed in as a UDF changes behavior, in a way 
> that primitive-type arguments are handled differently. For example an Int 
> argument, when fed 'null', acts like 0.
> I'm sure it's a difference in the LMF closure and how its types are 
> understood, but not exactly sure of the cause yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12

2018-08-07 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572115#comment-16572115
 ] 

Sean Owen commented on SPARK-25044:
---

More specific info. In Scala 2.12:
{code:java}
scala> val f = (i: Int, j: Long) => "x"
f: (Int, Long) => String = $$Lambda$1045/927369095@51ec2856

scala> val methods = f.getClass.getMethods.filter(m => m.getName == "apply" && 
!m.isBridge)
methods: Array[java.lang.reflect.Method] = Array(public java.lang.Object 
$$Lambda$1045/927369095.apply(java.lang.Object,java.lang.Object))

scala> methods.head.getParameterTypes
res0: Array[Class[_]] = Array(class java.lang.Object, class java.lang.Object)
{code}
 

Whereas in Scala 2.11 the result is:
{code:java}
...
scala> res0: Array[Class[_]] = Array(int, long){code}
 

I guess one question for folks like [~lrytz] is, is that 'correct' as far as 
Scala is concerned?  From reading 
[https://docs.oracle.com/javase/8/docs/api/java/lang/invoke/LambdaMetafactory.html]
 I got some sense that compilers had some latitude in how the lambda is 
implemented, but am just wondering if it makes sense that the {{apply}} 
method's signature doesn't seem to match what's expected.

Here is the full list of methods that {{f}}'s class implements; that first one 
is the only logical candidate to look for, I think, as it's the only one 
returning String.
{code:java}
public java.lang.String 
$line3.$read$$iw$$iw$$$Lambda$1045/927369095.apply(java.lang.Object,java.lang.Object)
public java.lang.Object 
$line3.$read$$iw$$iw$$$Lambda$1045/927369095.apply(java.lang.Object,java.lang.Object)
public final void java.lang.Object.wait(long,int) throws 
java.lang.InterruptedException
public final native void java.lang.Object.wait(long) throws 
java.lang.InterruptedException
public final void java.lang.Object.wait() throws java.lang.InterruptedException
public boolean java.lang.Object.equals(java.lang.Object)
public java.lang.String java.lang.Object.toString()
public native int java.lang.Object.hashCode()
public final native java.lang.Class java.lang.Object.getClass()
public final native void java.lang.Object.notify()
public final native void java.lang.Object.notifyAll()
public default scala.Function1 scala.Function2.curried()
public default scala.Function1 scala.Function2.tupled()
public default boolean scala.Function2.apply$mcZDD$sp(double,double)
public default double scala.Function2.apply$mcDDD$sp(double,double)
public default float scala.Function2.apply$mcFDD$sp(double,double)
public default int scala.Function2.apply$mcIDD$sp(double,double)
public default long scala.Function2.apply$mcJDD$sp(double,double)
public default void scala.Function2.apply$mcVDD$sp(double,double)
public default boolean scala.Function2.apply$mcZDI$sp(double,int)
public default double scala.Function2.apply$mcDDI$sp(double,int)
public default float scala.Function2.apply$mcFDI$sp(double,int)
public default int scala.Function2.apply$mcIDI$sp(double,int)
public default long scala.Function2.apply$mcJDI$sp(double,int)
public default void scala.Function2.apply$mcVDI$sp(double,int)
public default boolean scala.Function2.apply$mcZDJ$sp(double,long)
public default double scala.Function2.apply$mcDDJ$sp(double,long)
public default float scala.Function2.apply$mcFDJ$sp(double,long)
public default int scala.Function2.apply$mcIDJ$sp(double,long)
public default long scala.Function2.apply$mcJDJ$sp(double,long)
public default void scala.Function2.apply$mcVDJ$sp(double,long)
public default boolean scala.Function2.apply$mcZID$sp(int,double)
public default double scala.Function2.apply$mcDID$sp(int,double)
public default float scala.Function2.apply$mcFID$sp(int,double)
public default int scala.Function2.apply$mcIID$sp(int,double)
public default long scala.Function2.apply$mcJID$sp(int,double)
public default void scala.Function2.apply$mcVID$sp(int,double)
public default boolean scala.Function2.apply$mcZII$sp(int,int)
public default double scala.Function2.apply$mcDII$sp(int,int)
public default float scala.Function2.apply$mcFII$sp(int,int)
public default int scala.Function2.apply$mcIII$sp(int,int)
public default long scala.Function2.apply$mcJII$sp(int,int)
public default void scala.Function2.apply$mcVII$sp(int,int)
public default boolean scala.Function2.apply$mcZIJ$sp(int,long)
public default double scala.Function2.apply$mcDIJ$sp(int,long)
public default float scala.Function2.apply$mcFIJ$sp(int,long)
public default int scala.Function2.apply$mcIIJ$sp(int,long)
public default long scala.Function2.apply$mcJIJ$sp(int,long)
public default void scala.Function2.apply$mcVIJ$sp(int,long)
public default boolean scala.Function2.apply$mcZJD$sp(long,double)
public default double scala.Function2.apply$mcDJD$sp(long,double)
public default float scala.Function2.apply$mcFJD$sp(long,double)
public default int scala.Function2.apply$mcIJD$sp(long,double)
public default long scala.Function2.apply$mcJJD$sp(long,double)
public default void 

[jira] [Comment Edited] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel

2018-08-07 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572094#comment-16572094
 ] 

Stavros Kontopoulos edited comment on SPARK-25047 at 8/7/18 6:18 PM:
-

[~lrytz] thoughts?


was (Author: skonto):
[~lrytz] ideas?

> Can't assign SerializedLambda to scala.Function1 in deserialization of 
> BucketedRandomProjectionLSHModel
> ---
>
> Key: SPARK-25047
> URL: https://issues.apache.org/jira/browse/SPARK-25047
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Priority: Major
>
> Another distinct test failure:
> {code:java}
> - BucketedRandomProjectionLSH: streaming transform *** FAILED ***
>   org.apache.spark.sql.streaming.StreamingQueryException: Query [id = 
> 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = 
> 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted 
> due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent 
> failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): 
> java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of 
> type scala.Function1 in instance of 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
> ...
>   Cause: java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of 
> type scala.Function1 in instance of 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
>   at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>   at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284)
> ...{code}
> Here the different nature of a Java 8 LMF closure trips of Java 
> serialization/deserialization. I think this can be patched by manually 
> implementing the Java serialization here, and don't see other instances (yet).
> Also wondering if this "val" can be a "def".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12

2018-08-07 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572095#comment-16572095
 ] 

Stavros Kontopoulos edited comment on SPARK-25044 at 8/7/18 6:18 PM:
-

@lrytz any insight?


was (Author: skonto):
@lrytz thoughts?

> Address translation of LMF closure primitive args to Object in Scala 2.12
> -
>
> Key: SPARK-25044
> URL: https://issues.apache.org/jira/browse/SPARK-25044
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Priority: Major
>
> A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 
> Fix HandleNullInputsForUDF rule". (Details in a sec when I can copy-paste 
> them.)
> It seems that the closure that is fed in as a UDF changes behavior, in a way 
> that primitive-type arguments are handled differently. For example an Int 
> argument, when fed 'null', acts like 0.
> I'm sure it's a difference in the LMF closure and how its types are 
> understood, but not exactly sure of the cause yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12

2018-08-07 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572095#comment-16572095
 ] 

Stavros Kontopoulos commented on SPARK-25044:
-

@lrytz thoughts?

> Address translation of LMF closure primitive args to Object in Scala 2.12
> -
>
> Key: SPARK-25044
> URL: https://issues.apache.org/jira/browse/SPARK-25044
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Priority: Major
>
> A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 
> Fix HandleNullInputsForUDF rule". (Details in a sec when I can copy-paste 
> them.)
> It seems that the closure that is fed in as a UDF changes behavior, in a way 
> that primitive-type arguments are handled differently. For example an Int 
> argument, when fed 'null', acts like 0.
> I'm sure it's a difference in the LMF closure and how its types are 
> understood, but not exactly sure of the cause yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel

2018-08-07 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572094#comment-16572094
 ] 

Stavros Kontopoulos commented on SPARK-25047:
-

[~lrytz] ideas?

> Can't assign SerializedLambda to scala.Function1 in deserialization of 
> BucketedRandomProjectionLSHModel
> ---
>
> Key: SPARK-25047
> URL: https://issues.apache.org/jira/browse/SPARK-25047
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Priority: Major
>
> Another distinct test failure:
> {code:java}
> - BucketedRandomProjectionLSH: streaming transform *** FAILED ***
>   org.apache.spark.sql.streaming.StreamingQueryException: Query [id = 
> 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = 
> 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted 
> due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent 
> failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): 
> java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of 
> type scala.Function1 in instance of 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
> ...
>   Cause: java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of 
> type scala.Function1 in instance of 
> org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
>   at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>   at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284)
> ...{code}
> Here the different nature of a Java 8 LMF closure trips of Java 
> serialization/deserialization. I think this can be patched by manually 
> implementing the Java serialization here, and don't see other instances (yet).
> Also wondering if this "val" can be a "def".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25050) Handle more than two types in avro union types

2018-08-07 Thread DB Tsai (JIRA)
DB Tsai created SPARK-25050:
---

 Summary: Handle more than two types in avro union types
 Key: SPARK-25050
 URL: https://issues.apache.org/jira/browse/SPARK-25050
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: DB Tsai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25049) Support custom schema in `to_avro`

2018-08-07 Thread DB Tsai (JIRA)
DB Tsai created SPARK-25049:
---

 Summary: Support custom schema in `to_avro`
 Key: SPARK-25049
 URL: https://issues.apache.org/jira/browse/SPARK-25049
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: DB Tsai
Assignee: DB Tsai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25029) Scala 2.12 issues: TaskNotSerializable and Janino "Two non-abstract methods ..." errors

2018-08-07 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572020#comment-16572020
 ] 

shane knapp commented on SPARK-25029:
-

updated the build so concurrent runs can happen, albeit restricted to one build 
per ubuntu node.  this should help built throughput significantly.

> Scala 2.12 issues: TaskNotSerializable and Janino "Two non-abstract methods 
> ..." errors
> ---
>
> Key: SPARK-25029
> URL: https://issues.apache.org/jira/browse/SPARK-25029
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Priority: Blocker
>
> We actually still have some test failures in the Scala 2.12 build. There seem 
> to be two types. First are that some tests fail with "TaskNotSerializable" 
> because some code construct now captures a reference to scalatest's 
> AssertionHelper. Example:
> {code:java}
> - LegacyAccumulatorWrapper with AccumulatorParam that has no equals/hashCode 
> *** FAILED *** java.io.NotSerializableException: 
> org.scalatest.Assertions$AssertionsHelper Serialization stack: - object not 
> serializable (class: org.scalatest.Assertions$AssertionsHelper, value: 
> org.scalatest.Assertions$AssertionsHelper@3bc5fc8f){code}
> These seem generally easy to fix by tweaking the test code. It's not clear if 
> something about closure cleaning in 2.12 could be improved to detect this 
> situation automatically; given that yet only a handful of tests fail for this 
> reason, it's unlikely to be a systemic problem.
>  
> The other error is curioser. Janino fails to compile generate code in many 
> cases with errors like:
> {code:java}
> - encode/decode for seq of string: List(abc, xyz) *** FAILED ***
> java.lang.RuntimeException: Error while encoding: 
> org.codehaus.janino.InternalCompilerException: failed to compile: 
> org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass": 
> Two non-abstract methods "public int scala.collection.TraversableOnce.size()" 
> have the same parameter types, declaring type and return type{code}
>  
> I include the full generated code that failed in one case below. There is no 
> {{size()}} in the generated code. It's got to be down to some difference in 
> Scala 2.12, potentially even a Janino problem.
>  
> {code:java}
> Caused by: org.codehaus.janino.InternalCompilerException: Compiling 
> "GeneratedClass": Two non-abstract methods "public int 
> scala.collection.TraversableOnce.size()" have the same parameter types, 
> declaring type and return type
> at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:361)
> at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:234)
> at 
> org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:446)
> at 
> org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:313)
> at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:235)
> at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:204)
> at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1342)
> ... 30 more
> Caused by: org.codehaus.janino.InternalCompilerException: Two non-abstract 
> methods "public int scala.collection.TraversableOnce.size()" have the same 
> parameter types, declaring type and return type
> at 
> org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:9112)
> at 
> org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:)
> at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:8770)
> at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:8672)
> at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4737)
> at org.codehaus.janino.UnitCompiler.access$8300(UnitCompiler.java:212)
> at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:4097)
> at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:4070)
> at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4902)
> at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:4070)
> at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:5253)
> at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4391)
> at org.codehaus.janino.UnitCompiler.access$8000(UnitCompiler.java:212)
> at 
> org.codehaus.janino.UnitCompiler$12.visitConditionalExpression(UnitCompiler.java:4094)
> at 
> org.codehaus.janino.UnitCompiler$12.visitConditionalExpression(UnitCompiler.java:4070)
> at org.codehaus.janino.Java$ConditionalExpression.accept(Java.java:4344)
> 

[jira] [Commented] (SPARK-24598) SPARK SQL:Datatype overflow conditions gives incorrect result

2018-08-07 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572011#comment-16572011
 ] 

Thomas Graves commented on SPARK-24598:
---

In the very least we should file a separate Jira to track it going into 3.0 if 
you plan on fixing it there

> SPARK SQL:Datatype overflow conditions gives incorrect result
> -
>
> Key: SPARK-24598
> URL: https://issues.apache.org/jira/browse/SPARK-24598
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: navya
>Assignee: Marco Gaido
>Priority: Major
> Fix For: 2.4.0
>
>
> Execute an sql query, so that it results in overflow conditions. 
> EX - SELECT 9223372036854775807 + 1 result = -9223372036854776000
>  
> Expected result - Error should be throw like mysql. 
> mysql> SELECT 9223372036854775807 + 1;
> ERROR 1690 (22003): BIGINT value is out of range in '(9223372036854775807 + 
> 1)'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23937) High-order function: map_filter(map, function) → MAP

2018-08-07 Thread Takuya Ueshin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-23937.
---
   Resolution: Fixed
 Assignee: Marco Gaido
Fix Version/s: 2.4.0

Issue resolved by pull request 21986
https://github.com/apache/spark/pull/21986

> High-order function: map_filter(map, function) → MAP
> --
>
> Key: SPARK-23937
> URL: https://issues.apache.org/jira/browse/SPARK-23937
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Marco Gaido
>Priority: Major
> Fix For: 2.4.0
>
>
> Constructs a map from those entries of map for which function returns true:
> {noformat}
> SELECT map_filter(MAP(ARRAY[], ARRAY[]), (k, v) -> true); -- {}
> SELECT map_filter(MAP(ARRAY[10, 20, 30], ARRAY['a', NULL, 'c']), (k, v) -> v 
> IS NOT NULL); -- {10 -> a, 30 -> c}
> SELECT map_filter(MAP(ARRAY['k1', 'k2', 'k3'], ARRAY[20, 3, 15]), (k, v) -> v 
> > 10); -- {k1 -> 20, k3 -> 15}
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25041) genjavadoc-plugin_0.10 is not found with sbt in scala-2.12

2018-08-07 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-25041.
---
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 22020
[https://github.com/apache/spark/pull/22020]

> genjavadoc-plugin_0.10 is not found with sbt in scala-2.12
> --
>
> Key: SPARK-25041
> URL: https://issues.apache.org/jira/browse/SPARK-25041
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Major
> Fix For: 2.4.0
>
>
> When the master is build with sbt in scala-2.12, the following error occurs:
> {code}
> [warn]module not found: 
> com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10
> [warn]  public: tried
> [warn]   
> https://repo1.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom
> [warn]  Maven2 Local: tried
> [warn]   
> file:/gsa/jpngsa/home/i/s/ishizaki/.m2/repository/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom
> [warn]  local: tried
> [warn]   
> /gsa/jpngsa/home/i/s/ishizaki/.ivy2/local/com.typesafe.genjavadoc/genjavadoc-plugin_2.12.6/0.10/ivys/ivy.xml
> [info] Resolving jline#jline;2.14.3 ...
> [warn]::
> [warn]::  UNRESOLVED DEPENDENCIES ::
> [warn]::
> [warn]:: com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not 
> found
> [warn]::
> [warn] 
> [warn]Note: Unresolved dependencies path:
> [warn]com.typesafe.genjavadoc:genjavadoc-plugin_2.12.6:0.10 
> (/home/ishizaki/Spark/PR/scala212/spark/project/SparkBuild.scala#L118)
> [warn]  +- org.apache.spark:spark-tags_2.12:2.4.0-SNAPSHOT
> sbt.ResolveException: unresolved dependency: 
> com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not found
>   at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:320)
>   at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
>   at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
>   at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>   at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>   at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
>   at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
>   at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
>   at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
>   at 
> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
>   at 
> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
>   at xsbt.boot.Using$.withResource(Using.scala:10)
>   at xsbt.boot.Using$.apply(Using.scala:9)
>   at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
>   at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
>   at xsbt.boot.Locks$.apply0(Locks.scala:31)
>   at xsbt.boot.Locks$.apply(Locks.scala:28)
>   at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
>   at sbt.IvySbt.withIvy(Ivy.scala:128)
>   at sbt.IvySbt.withIvy(Ivy.scala:125)
>   at sbt.IvySbt$Module.withModule(Ivy.scala:156)
>   at sbt.IvyActions$.updateEither(IvyActions.scala:168)
>   at 
> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1555)
>   at 
> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1551)
>   at 
> sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1586)
>   at 
> sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1584)
>   at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:37)
>   at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1589)
>   at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1583)
>   at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:60)
>   at sbt.Classpaths$.cachedUpdate(Defaults.scala:1606)
>   at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1533)
>   at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1485)
>   at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
>   at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
>   at sbt.std.Transform$$anon$4.work(System.scala:63)
>   at 
> sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
>   at 
> sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
>   at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
>  

[jira] [Assigned] (SPARK-25041) genjavadoc-plugin_0.10 is not found with sbt in scala-2.12

2018-08-07 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-25041:
-

Assignee: Kazuaki Ishizaki

> genjavadoc-plugin_0.10 is not found with sbt in scala-2.12
> --
>
> Key: SPARK-25041
> URL: https://issues.apache.org/jira/browse/SPARK-25041
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Major
> Fix For: 2.4.0
>
>
> When the master is build with sbt in scala-2.12, the following error occurs:
> {code}
> [warn]module not found: 
> com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10
> [warn]  public: tried
> [warn]   
> https://repo1.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom
> [warn]  Maven2 Local: tried
> [warn]   
> file:/gsa/jpngsa/home/i/s/ishizaki/.m2/repository/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom
> [warn]  local: tried
> [warn]   
> /gsa/jpngsa/home/i/s/ishizaki/.ivy2/local/com.typesafe.genjavadoc/genjavadoc-plugin_2.12.6/0.10/ivys/ivy.xml
> [info] Resolving jline#jline;2.14.3 ...
> [warn]::
> [warn]::  UNRESOLVED DEPENDENCIES ::
> [warn]::
> [warn]:: com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not 
> found
> [warn]::
> [warn] 
> [warn]Note: Unresolved dependencies path:
> [warn]com.typesafe.genjavadoc:genjavadoc-plugin_2.12.6:0.10 
> (/home/ishizaki/Spark/PR/scala212/spark/project/SparkBuild.scala#L118)
> [warn]  +- org.apache.spark:spark-tags_2.12:2.4.0-SNAPSHOT
> sbt.ResolveException: unresolved dependency: 
> com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not found
>   at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:320)
>   at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
>   at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
>   at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>   at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>   at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
>   at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
>   at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
>   at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
>   at 
> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
>   at 
> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
>   at xsbt.boot.Using$.withResource(Using.scala:10)
>   at xsbt.boot.Using$.apply(Using.scala:9)
>   at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
>   at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
>   at xsbt.boot.Locks$.apply0(Locks.scala:31)
>   at xsbt.boot.Locks$.apply(Locks.scala:28)
>   at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
>   at sbt.IvySbt.withIvy(Ivy.scala:128)
>   at sbt.IvySbt.withIvy(Ivy.scala:125)
>   at sbt.IvySbt$Module.withModule(Ivy.scala:156)
>   at sbt.IvyActions$.updateEither(IvyActions.scala:168)
>   at 
> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1555)
>   at 
> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1551)
>   at 
> sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1586)
>   at 
> sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1584)
>   at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:37)
>   at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1589)
>   at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1583)
>   at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:60)
>   at sbt.Classpaths$.cachedUpdate(Defaults.scala:1606)
>   at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1533)
>   at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1485)
>   at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
>   at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
>   at sbt.std.Transform$$anon$4.work(System.scala:63)
>   at 
> sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
>   at 
> sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
>   at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
>   at sbt.Execute.work(Execute.scala:237)
>   at 

[jira] [Updated] (SPARK-25048) Pivoting by multiple columns in Scala/Java

2018-08-07 Thread Maxim Gekk (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-25048:
---
Summary: Pivoting by multiple columns in Scala/Java  (was: Pivoting by 
multiple columns)

> Pivoting by multiple columns in Scala/Java
> --
>
> Key: SPARK-25048
> URL: https://issues.apache.org/jira/browse/SPARK-25048
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Maxim Gekk
>Priority: Minor
>
> Need to change or extend existing API to make pivoting by multiple columns 
> possible. Users should be able to use many columns and values like in the 
> example:
> {code:scala}
> trainingSales
>   .groupBy($"sales.year")
>   .pivot(struct(lower($"sales.course"), $"training"), Seq(
> struct(lit("dotnet"), lit("Experts")),
> struct(lit("java"), lit("Dummies")))
>   ).agg(sum($"sales.earnings"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25048) Pivoting by multiple columns

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571952#comment-16571952
 ] 

Apache Spark commented on SPARK-25048:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/22030

> Pivoting by multiple columns
> 
>
> Key: SPARK-25048
> URL: https://issues.apache.org/jira/browse/SPARK-25048
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Maxim Gekk
>Priority: Minor
>
> Need to change or extend existing API to make pivoting by multiple columns 
> possible. Users should be able to use many columns and values like in the 
> example:
> {code:scala}
> trainingSales
>   .groupBy($"sales.year")
>   .pivot(struct(lower($"sales.course"), $"training"), Seq(
> struct(lit("dotnet"), lit("Experts")),
> struct(lit("java"), lit("Dummies")))
>   ).agg(sum($"sales.earnings"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25048) Pivoting by multiple columns

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25048:


Assignee: Apache Spark

> Pivoting by multiple columns
> 
>
> Key: SPARK-25048
> URL: https://issues.apache.org/jira/browse/SPARK-25048
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Minor
>
> Need to change or extend existing API to make pivoting by multiple columns 
> possible. Users should be able to use many columns and values like in the 
> example:
> {code:scala}
> trainingSales
>   .groupBy($"sales.year")
>   .pivot(struct(lower($"sales.course"), $"training"), Seq(
> struct(lit("dotnet"), lit("Experts")),
> struct(lit("java"), lit("Dummies")))
>   ).agg(sum($"sales.earnings"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25048) Pivoting by multiple columns

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25048:


Assignee: (was: Apache Spark)

> Pivoting by multiple columns
> 
>
> Key: SPARK-25048
> URL: https://issues.apache.org/jira/browse/SPARK-25048
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Maxim Gekk
>Priority: Minor
>
> Need to change or extend existing API to make pivoting by multiple columns 
> possible. Users should be able to use many columns and values like in the 
> example:
> {code:scala}
> trainingSales
>   .groupBy($"sales.year")
>   .pivot(struct(lower($"sales.course"), $"training"), Seq(
> struct(lit("dotnet"), lit("Experts")),
> struct(lit("java"), lit("Dummies")))
>   ).agg(sum($"sales.earnings"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25048) Pivoting by multiple columns

2018-08-07 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-25048:
--

 Summary: Pivoting by multiple columns
 Key: SPARK-25048
 URL: https://issues.apache.org/jira/browse/SPARK-25048
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.1
Reporter: Maxim Gekk


Need to change or extend existing API to make pivoting by multiple columns 
possible. Users should be able to use many columns and values like in the 
example:
{code:scala}
trainingSales
  .groupBy($"sales.year")
  .pivot(struct(lower($"sales.course"), $"training"), Seq(
struct(lit("dotnet"), lit("Experts")),
struct(lit("java"), lit("Dummies")))
  ).agg(sum($"sales.earnings"))
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24924) Add mapping for built-in Avro data source

2018-08-07 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571908#comment-16571908
 ] 

Thomas Graves commented on SPARK-24924:
---

so originally when I started on this I didn't know about the side affects of 
the hive table here.

So this isn't as straight forward as I originally thought. I still personally 
don't like remapping this because users get something other then what they 
explicitly asked for, but if we want to keep this compatibility we either have 
to do that or actually have a com.databricks.avro class that would just map 
into our internal avro.  That would give the benefit that they could eclipse it 
with their own jar if they wanted to keep using their customer version, I 
assume we could theoretically also support the spark.read.avro format as well.  
Or I guess the third option is to just break compatibility and require the 
users to change the table property, but then they can't read it with older 
versions of spark. 

It also seems bad to me that we aren't supporting spark.read.avro, so its an 
api compatibility issue. We magically help them with compatibility with their 
tables by mapping them but we don't support the old api and they have to update 
your code.  This feels like an inconsistent story to me and not sure how that 
fits with our versioning policy since its a 3rd party thing.

Not sure I like any of these options. Seems like these are the options:

1)I wonder if we actually add the class com.databricks.avro into the spark 
source that does the remap and support spark.read/write.avro for a couple 
releases for compatibility, then remove it and tell people to change the table 
property or provide an api to do that. 

2) make the mapping of com.databricks.avro => internal avro configurable, that 
would allow them to continue use their version of com.databricks.avro until 
they can update api.

3) do nothing, leave this as is with this jira and user has to deal with losing 
spark.read.avro api and possible confusion and breaking if they are using 
modified version of com.databricks.avro 

thoughts from others?

 

> Add mapping for built-in Avro data source
> -
>
> Key: SPARK-24924
> URL: https://issues.apache.org/jira/browse/SPARK-24924
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.4.0
>
>
> This issue aims to the followings.
>  # Like `com.databricks.spark.csv` mapping, we had better map 
> `com.databricks.spark.avro` to built-in Avro data source.
>  # Remove incorrect error message, `Please find an Avro package at ...`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24395) Fix Behavior of NOT IN with Literals Containing NULL

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24395:


Assignee: Apache Spark

> Fix Behavior of NOT IN with Literals Containing NULL
> 
>
> Key: SPARK-24395
> URL: https://issues.apache.org/jira/browse/SPARK-24395
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Miles Yucht
>Assignee: Apache Spark
>Priority: Major
>
> Spark does not return the correct answer when evaluating NOT IN in some 
> cases. For example:
> {code:java}
> CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES
>   (null, null)
>   AS m(a, b);
> SELECT *
> FROM   m
> WHERE  a IS NULL AND b IS NULL
>AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
> 1;{code}
> According to the semantics of null-aware anti-join, this should return no 
> rows. However, it actually returns the row {{NULL NULL}}. This was found by 
> inspecting the unit tests added for SPARK-24381 
> ([https://github.com/apache/spark/pull/21425#pullrequestreview-123421822).]
> *Acceptance Criteria*:
>  * We should be able to add the following test cases back to 
> {{subquery/in-subquery/not-in-unit-test-multi-column-literal.sql}}:
> {code:java}
>   -- Case 2
>   -- (subquery contains a row with null in all columns -> row not returned)
> SELECT *
> FROM   m
> WHERE  (a, b) NOT IN ((CAST (null AS INT), CAST (null AS DECIMAL(2, 1;
>   -- Case 3
>   -- (probe-side columns are all null -> row not returned)
> SELECT *
> FROM   m
> WHERE  a IS NULL AND b IS NULL -- Matches only (null, null)
>AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
> 1;
>   -- Case 4
>   -- (one column null, other column matches a row in the subquery result -> 
> row not returned)
> SELECT *
> FROM   m
> WHERE  b = 1.0 -- Matches (null, 1.0)
>AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
> 1; 
> {code}
>  
> cc [~smilegator] [~juliuszsompolski]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24395) Fix Behavior of NOT IN with Literals Containing NULL

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571892#comment-16571892
 ] 

Apache Spark commented on SPARK-24395:
--

User 'mgaido91' has created a pull request for this issue:
https://github.com/apache/spark/pull/22029

> Fix Behavior of NOT IN with Literals Containing NULL
> 
>
> Key: SPARK-24395
> URL: https://issues.apache.org/jira/browse/SPARK-24395
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Miles Yucht
>Priority: Major
>
> Spark does not return the correct answer when evaluating NOT IN in some 
> cases. For example:
> {code:java}
> CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES
>   (null, null)
>   AS m(a, b);
> SELECT *
> FROM   m
> WHERE  a IS NULL AND b IS NULL
>AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
> 1;{code}
> According to the semantics of null-aware anti-join, this should return no 
> rows. However, it actually returns the row {{NULL NULL}}. This was found by 
> inspecting the unit tests added for SPARK-24381 
> ([https://github.com/apache/spark/pull/21425#pullrequestreview-123421822).]
> *Acceptance Criteria*:
>  * We should be able to add the following test cases back to 
> {{subquery/in-subquery/not-in-unit-test-multi-column-literal.sql}}:
> {code:java}
>   -- Case 2
>   -- (subquery contains a row with null in all columns -> row not returned)
> SELECT *
> FROM   m
> WHERE  (a, b) NOT IN ((CAST (null AS INT), CAST (null AS DECIMAL(2, 1;
>   -- Case 3
>   -- (probe-side columns are all null -> row not returned)
> SELECT *
> FROM   m
> WHERE  a IS NULL AND b IS NULL -- Matches only (null, null)
>AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
> 1;
>   -- Case 4
>   -- (one column null, other column matches a row in the subquery result -> 
> row not returned)
> SELECT *
> FROM   m
> WHERE  b = 1.0 -- Matches (null, 1.0)
>AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
> 1; 
> {code}
>  
> cc [~smilegator] [~juliuszsompolski]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24395) Fix Behavior of NOT IN with Literals Containing NULL

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24395:


Assignee: (was: Apache Spark)

> Fix Behavior of NOT IN with Literals Containing NULL
> 
>
> Key: SPARK-24395
> URL: https://issues.apache.org/jira/browse/SPARK-24395
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Miles Yucht
>Priority: Major
>
> Spark does not return the correct answer when evaluating NOT IN in some 
> cases. For example:
> {code:java}
> CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES
>   (null, null)
>   AS m(a, b);
> SELECT *
> FROM   m
> WHERE  a IS NULL AND b IS NULL
>AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
> 1;{code}
> According to the semantics of null-aware anti-join, this should return no 
> rows. However, it actually returns the row {{NULL NULL}}. This was found by 
> inspecting the unit tests added for SPARK-24381 
> ([https://github.com/apache/spark/pull/21425#pullrequestreview-123421822).]
> *Acceptance Criteria*:
>  * We should be able to add the following test cases back to 
> {{subquery/in-subquery/not-in-unit-test-multi-column-literal.sql}}:
> {code:java}
>   -- Case 2
>   -- (subquery contains a row with null in all columns -> row not returned)
> SELECT *
> FROM   m
> WHERE  (a, b) NOT IN ((CAST (null AS INT), CAST (null AS DECIMAL(2, 1;
>   -- Case 3
>   -- (probe-side columns are all null -> row not returned)
> SELECT *
> FROM   m
> WHERE  a IS NULL AND b IS NULL -- Matches only (null, null)
>AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
> 1;
>   -- Case 4
>   -- (one column null, other column matches a row in the subquery result -> 
> row not returned)
> SELECT *
> FROM   m
> WHERE  b = 1.0 -- Matches (null, 1.0)
>AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
> 1; 
> {code}
>  
> cc [~smilegator] [~juliuszsompolski]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel

2018-08-07 Thread Sean Owen (JIRA)
Sean Owen created SPARK-25047:
-

 Summary: Can't assign SerializedLambda to scala.Function1 in 
deserialization of BucketedRandomProjectionLSHModel
 Key: SPARK-25047
 URL: https://issues.apache.org/jira/browse/SPARK-25047
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Affects Versions: 2.4.0
Reporter: Sean Owen


Another distinct test failure:
{code:java}
- BucketedRandomProjectionLSH: streaming transform *** FAILED ***

  org.apache.spark.sql.streaming.StreamingQueryException: Query [id = 
7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = 
0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted 
due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent failure: 
Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): 
java.lang.ClassCastException: cannot assign instance of 
java.lang.invoke.SerializedLambda to field 
org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of 
type scala.Function1 in instance of 
org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel

...

  Cause: java.lang.ClassCastException: cannot assign instance of 
java.lang.invoke.SerializedLambda to field 
org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of 
type scala.Function1 in instance of 
org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel

  at 
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)

  at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)

  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284)
...{code}
Here the different nature of a Java 8 LMF closure trips of Java 
serialization/deserialization. I think this can be patched by manually 
implementing the Java serialization here, and don't see other instances (yet).

Also wondering if this "val" can be a "def".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24979) add AnalysisHelper#resolveOperatorsUp

2018-08-07 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-24979.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

> add AnalysisHelper#resolveOperatorsUp
> -
>
> Key: SPARK-24979
> URL: https://issues.apache.org/jira/browse/SPARK-24979
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25037) plan.transformAllExpressions() doesn't transform expressions in nested SubqueryExpression plans

2018-08-07 Thread Dilip Biswal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571855#comment-16571855
 ] 

Dilip Biswal commented on SPARK-25037:
--

[~chriso] [~hyukjin.kwon] Subquery plans are not part of parent plans 
transformAllExpressions. Its been like this from the beginning of subquery 
support in spark. Just a FYI.

> plan.transformAllExpressions() doesn't transform expressions in nested 
> SubqueryExpression plans
> ---
>
> Key: SPARK-25037
> URL: https://issues.apache.org/jira/browse/SPARK-25037
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Chris O'Hara
>Priority: Minor
>
> Given the following LogicalPlan:
> {code:java}
> scala> val plan = spark.sql("SELECT 1 bar FROM (SELECT 1 foo) WHERE foo IN 
> (SELECT 1 foo)").queryExecution.logical
> plan: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
> 'Project [1 AS bar#29]
> +- 'Filter 'foo IN (list#31 [])
>    :  +- Project [1 AS foo#30]
>    :     +- OneRowRelation
>    +- SubqueryAlias __auto_generated_subquery_name
>       +- Project [1 AS foo#28]
>          +- OneRowRelation
> {code}
> the following transformation should replace all instances of lit(1) with 
> lit(2):
> {code:java}
> scala> plan.transformAllExpressions { case l @ Literal(1, _) => l.copy(value 
> = 2) }
> res0: plan.type =
> 'Project [2 AS bar#29]
> +- 'Filter 'foo IN (list#31 [])
>    :  +- Project [1 AS foo#30]
>    :     +- OneRowRelation
>    +- SubqueryAlias __auto_generated_subquery_name
>       +- Project [2 AS foo#28]
>          +- OneRowRelation
> {code}
> Instead, the nested SubqueryExpression plan is not transformed.
> The expected output is: 
> {code:java}
> 'Project [2 AS bar#29]
> +- 'Filter 'foo IN (list#31 [])
>    :  +- Project [2 AS foo#30]
>    :     +- OneRowRelation
>    +- SubqueryAlias __auto_generated_subquery_name
>       +- Project [2 AS foo#28]
>          +- OneRowRelation
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24924) Add mapping for built-in Avro data source

2018-08-07 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571852#comment-16571852
 ] 

Thomas Graves commented on SPARK-24924:
---

thanks, I missed it in the output for spark as I was just looking at table 
properties. 

So what you are saying is that without this change to map databricks avro to 
our internal avro, the only way to update hive tables to use the internal avro 
version is to have them manually set the table properties? 

Do you know off hand if you are able to write to a hive table with datasource 
"com.databricks.spark.avro" using the internal avro version or does it error?  

> Add mapping for built-in Avro data source
> -
>
> Key: SPARK-24924
> URL: https://issues.apache.org/jira/browse/SPARK-24924
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.4.0
>
>
> This issue aims to the followings.
>  # Like `com.databricks.spark.csv` mapping, we had better map 
> `com.databricks.spark.avro` to built-in Avro data source.
>  # Remove incorrect error message, `Please find an Avro package at ...`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25046) Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO"

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25046:


Assignee: Apache Spark

> Alter View  can excute sql  like "ALTER VIEW ... AS INSERT INTO" 
> -
>
> Key: SPARK-25046
> URL: https://issues.apache.org/jira/browse/SPARK-25046
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: SongXun
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>
> Alter View  can excute sql  like "ALTER VIEW ... AS INSERT INTO" . We should 
> throw 
> ParseException(s"Operation not allowed: $message", ctx)  as Create View does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25046) Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO"

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25046:


Assignee: (was: Apache Spark)

> Alter View  can excute sql  like "ALTER VIEW ... AS INSERT INTO" 
> -
>
> Key: SPARK-25046
> URL: https://issues.apache.org/jira/browse/SPARK-25046
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: SongXun
>Priority: Minor
>  Labels: pull-request-available
>
> Alter View  can excute sql  like "ALTER VIEW ... AS INSERT INTO" . We should 
> throw 
> ParseException(s"Operation not allowed: $message", ctx)  as Create View does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25046) Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO"

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571842#comment-16571842
 ] 

Apache Spark commented on SPARK-25046:
--

User 'sddyljsx' has created a pull request for this issue:
https://github.com/apache/spark/pull/22028

> Alter View  can excute sql  like "ALTER VIEW ... AS INSERT INTO" 
> -
>
> Key: SPARK-25046
> URL: https://issues.apache.org/jira/browse/SPARK-25046
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: SongXun
>Priority: Minor
>  Labels: pull-request-available
>
> Alter View  can excute sql  like "ALTER VIEW ... AS INSERT INTO" . We should 
> throw 
> ParseException(s"Operation not allowed: $message", ctx)  as Create View does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25010) Rand/Randn should produce different values for each execution in streaming query

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571834#comment-16571834
 ] 

Apache Spark commented on SPARK-25010:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/22027

> Rand/Randn should produce different values for each execution in streaming 
> query
> 
>
> Key: SPARK-25010
> URL: https://issues.apache.org/jira/browse/SPARK-25010
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>
> Like Uuid in SPARK-24896, Rand and Randn expressions now produce the same 
> results for each execution in streaming query. It doesn't make too much sense 
> for streaming queries. We should make them produce different results as Uuid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25046) Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO"

2018-08-07 Thread SongXun (JIRA)
SongXun created SPARK-25046:
---

 Summary: Alter View  can excute sql  like "ALTER VIEW ... AS 
INSERT INTO" 
 Key: SPARK-25046
 URL: https://issues.apache.org/jira/browse/SPARK-25046
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: SongXun


Alter View  can excute sql  like "ALTER VIEW ... AS INSERT INTO" . We should 
throw 

ParseException(s"Operation not allowed: $message", ctx)  as Create View does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25045) Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25045:


Assignee: (was: Apache Spark)

> Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`
> ---
>
> Key: SPARK-25045
> URL: https://issues.apache.org/jira/browse/SPARK-25045
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Jiang Xingbo
>Priority: Major
>
> Signature of the function passed to `RDDBarrier.mapPartitions()` is different 
> from that of `RDD.mapPartitions`. The latter doesn’t take a TaskContext. We 
> shall make the function signature the same to avoid confusion and misusage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25045) Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25045:


Assignee: Apache Spark

> Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`
> ---
>
> Key: SPARK-25045
> URL: https://issues.apache.org/jira/browse/SPARK-25045
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Jiang Xingbo
>Assignee: Apache Spark
>Priority: Major
>
> Signature of the function passed to `RDDBarrier.mapPartitions()` is different 
> from that of `RDD.mapPartitions`. The latter doesn’t take a TaskContext. We 
> shall make the function signature the same to avoid confusion and misusage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25045) Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571804#comment-16571804
 ] 

Apache Spark commented on SPARK-25045:
--

User 'jiangxb1987' has created a pull request for this issue:
https://github.com/apache/spark/pull/22026

> Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`
> ---
>
> Key: SPARK-25045
> URL: https://issues.apache.org/jira/browse/SPARK-25045
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Jiang Xingbo
>Priority: Major
>
> Signature of the function passed to `RDDBarrier.mapPartitions()` is different 
> from that of `RDD.mapPartitions`. The latter doesn’t take a TaskContext. We 
> shall make the function signature the same to avoid confusion and misusage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24918) Executor Plugin API

2018-08-07 Thread Imran Rashid (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571793#comment-16571793
 ] 

Imran Rashid commented on SPARK-24918:
--

[~lucacanali] you could certainly sample stack traces, but the current proposal 
doesn't cover communication with the driver at all.  IMO that is too much 
complexity for v1.  Did you have a design in mind for that?

You could use the executor plugin to build your own communication between the 
driver and executors, but depending on what you want, might be tricky.

Do you think you could setup the configuration you need statically, when the 
application starts?  Eg. i had run a test to take stack traces anytime a task 
was running over some configurable time -- then I just needed task start & end 
events in my plugin.

> Executor Plugin API
> ---
>
> Key: SPARK-24918
> URL: https://issues.apache.org/jira/browse/SPARK-24918
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: SPIP, memory-analysis
>
> It would be nice if we could specify an arbitrary class to run within each 
> executor for debugging and instrumentation.  Its hard to do this currently 
> because:
> a) you have no idea when executors will come and go with DynamicAllocation, 
> so don't have a chance to run custom code before the first task
> b) even with static allocation, you'd have to change the code of your spark 
> app itself to run a special task to "install" the plugin, which is often 
> tough in production cases when those maintaining regularly running 
> applications might not even know how to make changes to the application.
> For example, https://github.com/squito/spark-memory could be used in a 
> debugging context to understand memory use, just by re-running an application 
> with extra command line arguments (as opposed to rebuilding spark).
> I think one tricky part here is just deciding the api, and how its versioned. 
>  Does it just get created when the executor starts, and thats it?  Or does it 
> get more specific events, like task start, task end, etc?  Would we ever add 
> more events?  It should definitely be a {{DeveloperApi}}, so breaking 
> compatibility would be allowed ... but still should be avoided.  We could 
> create a base class that has no-op implementations, or explicitly version 
> everything.
> Note that this is not needed in the driver as we already have SparkListeners 
> (even if you don't care about the SparkListenerEvents and just want to 
> inspect objects in the JVM, its still good enough).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25045) Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`

2018-08-07 Thread Jiang Xingbo (JIRA)
Jiang Xingbo created SPARK-25045:


 Summary: Make `RDDBarrier.mapParititions` similar to 
`RDD.mapPartitions`
 Key: SPARK-25045
 URL: https://issues.apache.org/jira/browse/SPARK-25045
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: Jiang Xingbo


Signature of the function passed to `RDDBarrier.mapPartitions()` is different 
from that of `RDD.mapPartitions`. The latter doesn’t take a TaskContext. We 
shall make the function signature the same to avoid confusion and misusage.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12

2018-08-07 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-25044:
--
Description: 
A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 Fix 
HandleNullInputsForUDF rule". (Details in a sec when I can copy-paste them.)

It seems that the closure that is fed in as a UDF changes behavior, in a way 
that primitive-type arguments are handled differently. For example an Int 
argument, when fed 'null', acts like 0.

I'm sure it's a difference in the LMF closure and how its types are understood, 
but not exactly sure of the cause yet.

> Address translation of LMF closure primitive args to Object in Scala 2.12
> -
>
> Key: SPARK-25044
> URL: https://issues.apache.org/jira/browse/SPARK-25044
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Priority: Major
>
> A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 
> Fix HandleNullInputsForUDF rule". (Details in a sec when I can copy-paste 
> them.)
> It seems that the closure that is fed in as a UDF changes behavior, in a way 
> that primitive-type arguments are handled differently. For example an Int 
> argument, when fed 'null', acts like 0.
> I'm sure it's a difference in the LMF closure and how its types are 
> understood, but not exactly sure of the cause yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12

2018-08-07 Thread Sean Owen (JIRA)
Sean Owen created SPARK-25044:
-

 Summary: Address translation of LMF closure primitive args to 
Object in Scala 2.12
 Key: SPARK-25044
 URL: https://issues.apache.org/jira/browse/SPARK-25044
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 2.4.0
Reporter: Sean Owen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-14220) Build and test Spark against Scala 2.12

2018-08-07 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reopened SPARK-14220:
---
  Assignee: Sean Owen

OK, bad news, I think we still have several non-trivial issues with Scala 2.12 
support: at least the janino compiler issue and a new one about how lambda 
metafactory closures seem to implement primitive args as reference types, which 
makes some SQL operations change semantics. I'm going to organize some open 
JIRAs here accordingly.

> Build and test Spark against Scala 2.12
> ---
>
> Key: SPARK-14220
> URL: https://issues.apache.org/jira/browse/SPARK-14220
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Project Infra
>Affects Versions: 2.1.0
>Reporter: Josh Rosen
>Assignee: Sean Owen
>Priority: Blocker
>  Labels: release-notes
>
> This umbrella JIRA tracks the requirements for building and testing Spark 
> against the current Scala 2.12 milestone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14220) Build and test Spark against Scala 2.12

2018-08-07 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-14220:
--
Target Version/s: 2.4.0
   Fix Version/s: (was: 2.4.0)

> Build and test Spark against Scala 2.12
> ---
>
> Key: SPARK-14220
> URL: https://issues.apache.org/jira/browse/SPARK-14220
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Project Infra
>Affects Versions: 2.1.0
>Reporter: Josh Rosen
>Assignee: Sean Owen
>Priority: Blocker
>  Labels: release-notes
>
> This umbrella JIRA tracks the requirements for building and testing Spark 
> against the current Scala 2.12 milestone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24918) Executor Plugin API

2018-08-07 Thread Luca Canali (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571748#comment-16571748
 ] 

Luca Canali commented on SPARK-24918:
-

I have a use case where I would like to sample stack traces of the Spark 
executors across the cluster and later aggregate the data into a Flame Graph. I 
may want to do data collection only for a short duration (due to the overhead) 
and possibly be able to start and stop data collection at will from the driver. 
Similar use cases would be to deploy "probes" using tools for dynamic tracing 
to measure specific details of the workload.
 I think the executor plugin would be useful for this. In additional it would 
be nice to have a mechanism to send and receive commands/data between the Spark 
driver and the plugin process.  
 Would this proposal make sense in the context of this SPIP or would it add too 
much complexity to the original proposal?

> Executor Plugin API
> ---
>
> Key: SPARK-24918
> URL: https://issues.apache.org/jira/browse/SPARK-24918
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: SPIP, memory-analysis
>
> It would be nice if we could specify an arbitrary class to run within each 
> executor for debugging and instrumentation.  Its hard to do this currently 
> because:
> a) you have no idea when executors will come and go with DynamicAllocation, 
> so don't have a chance to run custom code before the first task
> b) even with static allocation, you'd have to change the code of your spark 
> app itself to run a special task to "install" the plugin, which is often 
> tough in production cases when those maintaining regularly running 
> applications might not even know how to make changes to the application.
> For example, https://github.com/squito/spark-memory could be used in a 
> debugging context to understand memory use, just by re-running an application 
> with extra command line arguments (as opposed to rebuilding spark).
> I think one tricky part here is just deciding the api, and how its versioned. 
>  Does it just get created when the executor starts, and thats it?  Or does it 
> get more specific events, like task start, task end, etc?  Would we ever add 
> more events?  It should definitely be a {{DeveloperApi}}, so breaking 
> compatibility would be allowed ... but still should be avoided.  We could 
> create a base class that has no-op implementations, or explicitly version 
> everything.
> Note that this is not needed in the driver as we already have SparkListeners 
> (even if you don't care about the SparkListenerEvents and just want to 
> inspect objects in the JVM, its still good enough).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25043) spark-sql should print the appId and master on startup

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25043:


Assignee: Apache Spark

> spark-sql should print the appId and master on startup
> --
>
> Key: SPARK-25043
> URL: https://issues.apache.org/jira/browse/SPARK-25043
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Alessandro Bellina
>Assignee: Apache Spark
>Priority: Trivial
>
> In spark-sql, if logging is turned down all the way, it's not possible to 
> find out what appId is running at the moment. This small change as a print to 
> stdout containing the master type and the appId to have that on screen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25043) spark-sql should print the appId and master on startup

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25043:


Assignee: (was: Apache Spark)

> spark-sql should print the appId and master on startup
> --
>
> Key: SPARK-25043
> URL: https://issues.apache.org/jira/browse/SPARK-25043
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Alessandro Bellina
>Priority: Trivial
>
> In spark-sql, if logging is turned down all the way, it's not possible to 
> find out what appId is running at the moment. This small change as a print to 
> stdout containing the master type and the appId to have that on screen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25043) spark-sql should print the appId and master on startup

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571736#comment-16571736
 ] 

Apache Spark commented on SPARK-25043:
--

User 'abellina' has created a pull request for this issue:
https://github.com/apache/spark/pull/22025

> spark-sql should print the appId and master on startup
> --
>
> Key: SPARK-25043
> URL: https://issues.apache.org/jira/browse/SPARK-25043
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Alessandro Bellina
>Priority: Trivial
>
> In spark-sql, if logging is turned down all the way, it's not possible to 
> find out what appId is running at the moment. This small change as a print to 
> stdout containing the master type and the appId to have that on screen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25043) spark-sql should print the appId and master on startup

2018-08-07 Thread Alessandro Bellina (JIRA)
Alessandro Bellina created SPARK-25043:
--

 Summary: spark-sql should print the appId and master on startup
 Key: SPARK-25043
 URL: https://issues.apache.org/jira/browse/SPARK-25043
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.1
Reporter: Alessandro Bellina


In spark-sql, if logging is turned down all the way, it's not possible to find 
out what appId is running at the moment. This small change as a print to stdout 
containing the master type and the appId to have that on screen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19602) Unable to query using the fully qualified column name of the form ( ..)

2018-08-07 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-19602.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 17185
[https://github.com/apache/spark/pull/17185]

> Unable to query using the fully qualified column name of the form ( 
> ..)
> --
>
> Key: SPARK-19602
> URL: https://issues.apache.org/jira/browse/SPARK-19602
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Sunitha Kambhampati
>Assignee: Sunitha Kambhampati
>Priority: Major
> Fix For: 2.4.0
>
> Attachments: Design_ColResolution_JIRA19602.pdf
>
>
> 1) Spark SQL fails to analyze this query:  select db1.t1.i1 from db1.t1, 
> db2.t1
> Most of the other database systems support this ( e.g DB2, Oracle, MySQL).
> Note: In DB2, Oracle, the notion is of ..
> 2) Another scenario where this fully qualified name is useful is as follows:
>   // current database is db1. 
>   select t1.i1 from t1, db2.t1   
> If the i1 column exists in both tables: db1.t1 and db2.t1, this will throw an 
> error during column resolution in the analyzer, as it is ambiguous. 
> Lets say the user intended to retrieve i1 from db1.t1 but in the example 
> db2.t1 only has i1 column. The query would still succeed instead of throwing 
> an error.  
> One way to avoid confusion would be to explicitly specify using the fully 
> qualified name db1.t1.i1 
> For e.g:  select db1.t1.i1 from t1, db2.t1  
> Workarounds:
> There is a workaround for these issues, which is to use an alias. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25003) Pyspark Does not use Spark Sql Extensions

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571630#comment-16571630
 ] 

Apache Spark commented on SPARK-25003:
--

User 'RussellSpitzer' has created a pull request for this issue:
https://github.com/apache/spark/pull/21988

> Pyspark Does not use Spark Sql Extensions
> -
>
> Key: SPARK-25003
> URL: https://issues.apache.org/jira/browse/SPARK-25003
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.2, 2.3.1
>Reporter: Russell Spitzer
>Priority: Major
>
> When creating a SparkSession here
> [https://github.com/apache/spark/blob/v2.2.2/python/pyspark/sql/session.py#L216]
> {code:python}
> if jsparkSession is None:
>   jsparkSession = self._jvm.SparkSession(self._jsc.sc())
> self._jsparkSession = jsparkSession
> {code}
> I believe it ends up calling the constructor here
> https://github.com/apache/spark/blob/v2.2.2/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L85-L87
> {code:scala}
>   private[sql] def this(sc: SparkContext) {
> this(sc, None, None, new SparkSessionExtensions)
>   }
> {code}
> Which creates a new SparkSessionsExtensions object and does not pick up new 
> extensions that could have been set in the config like the companion 
> getOrCreate does.
> https://github.com/apache/spark/blob/v2.2.2/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L928-L944
> {code:scala}
> //in getOrCreate
> // Initialize extensions if the user has defined a configurator class.
> val extensionConfOption = 
> sparkContext.conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS)
> if (extensionConfOption.isDefined) {
>   val extensionConfClassName = extensionConfOption.get
>   try {
> val extensionConfClass = 
> Utils.classForName(extensionConfClassName)
> val extensionConf = extensionConfClass.newInstance()
>   .asInstanceOf[SparkSessionExtensions => Unit]
> extensionConf(extensions)
>   } catch {
> // Ignore the error if we cannot find the class or when the class 
> has the wrong type.
> case e @ (_: ClassCastException |
>   _: ClassNotFoundException |
>   _: NoClassDefFoundError) =>
>   logWarning(s"Cannot use $extensionConfClassName to configure 
> session extensions.", e)
>   }
> }
> {code}
> I think a quick fix would be to use the getOrCreate method from the companion 
> object instead of calling the constructor from the SparkContext. Or we could 
> fix this by ensuring that all constructors attempt to pick up custom 
> extensions if they are set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25003) Pyspark Does not use Spark Sql Extensions

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571631#comment-16571631
 ] 

Apache Spark commented on SPARK-25003:
--

User 'RussellSpitzer' has created a pull request for this issue:
https://github.com/apache/spark/pull/21989

> Pyspark Does not use Spark Sql Extensions
> -
>
> Key: SPARK-25003
> URL: https://issues.apache.org/jira/browse/SPARK-25003
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.2, 2.3.1
>Reporter: Russell Spitzer
>Priority: Major
>
> When creating a SparkSession here
> [https://github.com/apache/spark/blob/v2.2.2/python/pyspark/sql/session.py#L216]
> {code:python}
> if jsparkSession is None:
>   jsparkSession = self._jvm.SparkSession(self._jsc.sc())
> self._jsparkSession = jsparkSession
> {code}
> I believe it ends up calling the constructor here
> https://github.com/apache/spark/blob/v2.2.2/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L85-L87
> {code:scala}
>   private[sql] def this(sc: SparkContext) {
> this(sc, None, None, new SparkSessionExtensions)
>   }
> {code}
> Which creates a new SparkSessionsExtensions object and does not pick up new 
> extensions that could have been set in the config like the companion 
> getOrCreate does.
> https://github.com/apache/spark/blob/v2.2.2/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L928-L944
> {code:scala}
> //in getOrCreate
> // Initialize extensions if the user has defined a configurator class.
> val extensionConfOption = 
> sparkContext.conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS)
> if (extensionConfOption.isDefined) {
>   val extensionConfClassName = extensionConfOption.get
>   try {
> val extensionConfClass = 
> Utils.classForName(extensionConfClassName)
> val extensionConf = extensionConfClass.newInstance()
>   .asInstanceOf[SparkSessionExtensions => Unit]
> extensionConf(extensions)
>   } catch {
> // Ignore the error if we cannot find the class or when the class 
> has the wrong type.
> case e @ (_: ClassCastException |
>   _: ClassNotFoundException |
>   _: NoClassDefFoundError) =>
>   logWarning(s"Cannot use $extensionConfClassName to configure 
> session extensions.", e)
>   }
> }
> {code}
> I think a quick fix would be to use the getOrCreate method from the companion 
> object instead of calling the constructor from the SparkContext. Or we could 
> fix this by ensuring that all constructors attempt to pick up custom 
> extensions if they are set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25034) possible triple memory consumption in fetchBlockSync()

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25034:


Assignee: Apache Spark

> possible triple memory consumption in fetchBlockSync()
> --
>
> Key: SPARK-25034
> URL: https://issues.apache.org/jira/browse/SPARK-25034
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.2, 2.3.0, 2.4.0
>Reporter: Vincent
>Assignee: Apache Spark
>Priority: Major
>
> Hello
> in the code of  _fetchBlockSync_() in _blockTransferService_, we have:
>  
> {code:java}
> val ret = ByteBuffer.allocate(data.size.toInt)
> ret.put(data.nioByteBuffer())
> ret.flip()
> result.success(new NioManagedBuffer(ret)) 
> {code}
> In some cases, the _data_ variable is a _NettyManagedBuffer_, whose 
> underlying netty representation is a _CompositeByteBuffer_.
> Going through the code above in this configuration, assuming that the 
> variable _data_ holds N bytes:
> 1) we allocate a full buffer of N bytes in _ret_
> 2) calling _data.nioByteBuffer()_ on a  _CompositeByteBuffer_ will trigger a 
> full merge of all the composite buffers, which will allocate  *again* a full 
> buffer of N bytes
> 3) we copy to _ret_ the data byte by byte
> This means that at some point the N bytes of data are located 3 times in 
> memory.
> Is this really necessary?
> It seems unclear to me why we have to process at all the data, given that we 
> receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_ 
> Is there something I'm missing here? It seems this whole operation could be 
> done with 0 copies. 
> The only upside here is that the new buffer will have merged all the 
> composite buffer's arrays, but it is really not clear if this is intended. In 
> any case this could be done with peak memory of 2N and not 3N
> Cheers!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25034) possible triple memory consumption in fetchBlockSync()

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571626#comment-16571626
 ] 

Apache Spark commented on SPARK-25034:
--

User 'vincent-grosbois' has created a pull request for this issue:
https://github.com/apache/spark/pull/22024

> possible triple memory consumption in fetchBlockSync()
> --
>
> Key: SPARK-25034
> URL: https://issues.apache.org/jira/browse/SPARK-25034
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.2, 2.3.0, 2.4.0
>Reporter: Vincent
>Priority: Major
>
> Hello
> in the code of  _fetchBlockSync_() in _blockTransferService_, we have:
>  
> {code:java}
> val ret = ByteBuffer.allocate(data.size.toInt)
> ret.put(data.nioByteBuffer())
> ret.flip()
> result.success(new NioManagedBuffer(ret)) 
> {code}
> In some cases, the _data_ variable is a _NettyManagedBuffer_, whose 
> underlying netty representation is a _CompositeByteBuffer_.
> Going through the code above in this configuration, assuming that the 
> variable _data_ holds N bytes:
> 1) we allocate a full buffer of N bytes in _ret_
> 2) calling _data.nioByteBuffer()_ on a  _CompositeByteBuffer_ will trigger a 
> full merge of all the composite buffers, which will allocate  *again* a full 
> buffer of N bytes
> 3) we copy to _ret_ the data byte by byte
> This means that at some point the N bytes of data are located 3 times in 
> memory.
> Is this really necessary?
> It seems unclear to me why we have to process at all the data, given that we 
> receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_ 
> Is there something I'm missing here? It seems this whole operation could be 
> done with 0 copies. 
> The only upside here is that the new buffer will have merged all the 
> composite buffer's arrays, but it is really not clear if this is intended. In 
> any case this could be done with peak memory of 2N and not 3N
> Cheers!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25034) possible triple memory consumption in fetchBlockSync()

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25034:


Assignee: (was: Apache Spark)

> possible triple memory consumption in fetchBlockSync()
> --
>
> Key: SPARK-25034
> URL: https://issues.apache.org/jira/browse/SPARK-25034
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.2, 2.3.0, 2.4.0
>Reporter: Vincent
>Priority: Major
>
> Hello
> in the code of  _fetchBlockSync_() in _blockTransferService_, we have:
>  
> {code:java}
> val ret = ByteBuffer.allocate(data.size.toInt)
> ret.put(data.nioByteBuffer())
> ret.flip()
> result.success(new NioManagedBuffer(ret)) 
> {code}
> In some cases, the _data_ variable is a _NettyManagedBuffer_, whose 
> underlying netty representation is a _CompositeByteBuffer_.
> Going through the code above in this configuration, assuming that the 
> variable _data_ holds N bytes:
> 1) we allocate a full buffer of N bytes in _ret_
> 2) calling _data.nioByteBuffer()_ on a  _CompositeByteBuffer_ will trigger a 
> full merge of all the composite buffers, which will allocate  *again* a full 
> buffer of N bytes
> 3) we copy to _ret_ the data byte by byte
> This means that at some point the N bytes of data are located 3 times in 
> memory.
> Is this really necessary?
> It seems unclear to me why we have to process at all the data, given that we 
> receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_ 
> Is there something I'm missing here? It seems this whole operation could be 
> done with 0 copies. 
> The only upside here is that the new buffer will have merged all the 
> composite buffer's arrays, but it is really not clear if this is intended. In 
> any case this could be done with peak memory of 2N and not 3N
> Cheers!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23928) High-order function: shuffle(x) → array

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571421#comment-16571421
 ] 

Apache Spark commented on SPARK-23928:
--

User 'mgaido91' has created a pull request for this issue:
https://github.com/apache/spark/pull/22023

> High-order function: shuffle(x) → array
> ---
>
> Key: SPARK-23928
> URL: https://issues.apache.org/jira/browse/SPARK-23928
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: H Lu
>Priority: Major
> Fix For: 2.4.0
>
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Generate a random permutation of the given array x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25042) Flaky test: org.apache.spark.streaming.kafka010.KafkaRDDSuite.compacted topic

2018-08-07 Thread Marco Gaido (JIRA)
Marco Gaido created SPARK-25042:
---

 Summary: Flaky test: 
org.apache.spark.streaming.kafka010.KafkaRDDSuite.compacted topic
 Key: SPARK-25042
 URL: https://issues.apache.org/jira/browse/SPARK-25042
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 2.4.0
Reporter: Marco Gaido


The test {{compacted topic}} in 
{{org.apache.spark.streaming.kafka010.KafkaRDDSuite}} is flaky: it failed in an 
unrelated PR: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94293/testReport/.
 And it passes locally on the same branch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24772) support reading AVRO logical types - Decimal

2018-08-07 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-24772:
---

Assignee: Gengliang Wang

> support reading AVRO logical types - Decimal
> 
>
> Key: SPARK-24772
> URL: https://issues.apache.org/jira/browse/SPARK-24772
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24772) support reading AVRO logical types - Decimal

2018-08-07 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-24772.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21984
[https://github.com/apache/spark/pull/21984]

> support reading AVRO logical types - Decimal
> 
>
> Key: SPARK-24772
> URL: https://issues.apache.org/jira/browse/SPARK-24772
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24005) Remove usage of Scala’s parallel collection

2018-08-07 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-24005:
---

Assignee: Maxim Gekk

> Remove usage of Scala’s parallel collection
> ---
>
> Key: SPARK-24005
> URL: https://issues.apache.org/jira/browse/SPARK-24005
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Maxim Gekk
>Priority: Major
>  Labels: starter
> Fix For: 2.4.0
>
>
> {noformat}
> val par = (1 to 100).par.flatMap { i =>
>   Thread.sleep(1000)
>   1 to 1000
> }.toSeq
> {noformat}
> We are unable to interrupt the execution of parallel collections. We need to 
> create a common utility function to do it, instead of using Scala parallel 
> collections



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24005) Remove usage of Scala’s parallel collection

2018-08-07 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-24005.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21913
[https://github.com/apache/spark/pull/21913]

> Remove usage of Scala’s parallel collection
> ---
>
> Key: SPARK-24005
> URL: https://issues.apache.org/jira/browse/SPARK-24005
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Maxim Gekk
>Priority: Major
>  Labels: starter
> Fix For: 2.4.0
>
>
> {noformat}
> val par = (1 to 100).par.flatMap { i =>
>   Thread.sleep(1000)
>   1 to 1000
> }.toSeq
> {noformat}
> We are unable to interrupt the execution of parallel collections. We need to 
> create a common utility function to do it, instead of using Scala parallel 
> collections



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24948) SHS filters wrongly some applications due to permission check

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571294#comment-16571294
 ] 

Apache Spark commented on SPARK-24948:
--

User 'mgaido91' has created a pull request for this issue:
https://github.com/apache/spark/pull/22022

> SHS filters wrongly some applications due to permission check
> -
>
> Key: SPARK-24948
> URL: https://issues.apache.org/jira/browse/SPARK-24948
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: Marco Gaido
>Priority: Blocker
> Fix For: 2.4.0
>
>
> SHS filters the event logs it doesn't have permissions to read. 
> Unfortunately, this check is quite naive, as it takes into account only the 
> base permissions (ie. user, group, other permissions). For instance, if ACL 
> are enabled, they are ignored in this check; moreover, each filesystem may 
> have different policies (eg. they can consider spark as a superuser who can 
> access everything).
> This results in some applications not being displayed in the SHS, despite the 
> Spark user (or whatever user the SHS is started with) can actually read their 
> ent logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24948) SHS filters wrongly some applications due to permission check

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571260#comment-16571260
 ] 

Apache Spark commented on SPARK-24948:
--

User 'mgaido91' has created a pull request for this issue:
https://github.com/apache/spark/pull/22021

> SHS filters wrongly some applications due to permission check
> -
>
> Key: SPARK-24948
> URL: https://issues.apache.org/jira/browse/SPARK-24948
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: Marco Gaido
>Priority: Blocker
> Fix For: 2.4.0
>
>
> SHS filters the event logs it doesn't have permissions to read. 
> Unfortunately, this check is quite naive, as it takes into account only the 
> base permissions (ie. user, group, other permissions). For instance, if ACL 
> are enabled, they are ignored in this check; moreover, each filesystem may 
> have different policies (eg. they can consider spark as a superuser who can 
> access everything).
> This results in some applications not being displayed in the SHS, despite the 
> Spark user (or whatever user the SHS is started with) can actually read their 
> ent logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25041) genjavadoc-plugin_0.10 is not found with sbt in scala-2.12

2018-08-07 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-25041:
-
Summary: genjavadoc-plugin_0.10 is not found with sbt in scala-2.12  (was: 
genjavadoc-plugin_2.12.6 is not found with sbt in scala-2.12)

> genjavadoc-plugin_0.10 is not found with sbt in scala-2.12
> --
>
> Key: SPARK-25041
> URL: https://issues.apache.org/jira/browse/SPARK-25041
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> When the master is build with sbt in scala-2.12, the following error occurs:
> {code}
> [warn]module not found: 
> com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10
> [warn]  public: tried
> [warn]   
> https://repo1.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom
> [warn]  Maven2 Local: tried
> [warn]   
> file:/gsa/jpngsa/home/i/s/ishizaki/.m2/repository/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom
> [warn]  local: tried
> [warn]   
> /gsa/jpngsa/home/i/s/ishizaki/.ivy2/local/com.typesafe.genjavadoc/genjavadoc-plugin_2.12.6/0.10/ivys/ivy.xml
> [info] Resolving jline#jline;2.14.3 ...
> [warn]::
> [warn]::  UNRESOLVED DEPENDENCIES ::
> [warn]::
> [warn]:: com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not 
> found
> [warn]::
> [warn] 
> [warn]Note: Unresolved dependencies path:
> [warn]com.typesafe.genjavadoc:genjavadoc-plugin_2.12.6:0.10 
> (/home/ishizaki/Spark/PR/scala212/spark/project/SparkBuild.scala#L118)
> [warn]  +- org.apache.spark:spark-tags_2.12:2.4.0-SNAPSHOT
> sbt.ResolveException: unresolved dependency: 
> com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not found
>   at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:320)
>   at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
>   at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
>   at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>   at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>   at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
>   at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
>   at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
>   at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
>   at 
> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
>   at 
> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
>   at xsbt.boot.Using$.withResource(Using.scala:10)
>   at xsbt.boot.Using$.apply(Using.scala:9)
>   at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
>   at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
>   at xsbt.boot.Locks$.apply0(Locks.scala:31)
>   at xsbt.boot.Locks$.apply(Locks.scala:28)
>   at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
>   at sbt.IvySbt.withIvy(Ivy.scala:128)
>   at sbt.IvySbt.withIvy(Ivy.scala:125)
>   at sbt.IvySbt$Module.withModule(Ivy.scala:156)
>   at sbt.IvyActions$.updateEither(IvyActions.scala:168)
>   at 
> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1555)
>   at 
> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1551)
>   at 
> sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1586)
>   at 
> sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1584)
>   at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:37)
>   at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1589)
>   at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1583)
>   at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:60)
>   at sbt.Classpaths$.cachedUpdate(Defaults.scala:1606)
>   at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1533)
>   at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1485)
>   at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
>   at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
>   at sbt.std.Transform$$anon$4.work(System.scala:63)
>   at 
> sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
>   at 
> sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
>   at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
>   at sbt.Execute.work(Execute.scala:237)
>   

[jira] [Assigned] (SPARK-25041) genjavadoc-plugin_2.12.6 is not found with sbt in scala-2.12

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25041:


Assignee: (was: Apache Spark)

> genjavadoc-plugin_2.12.6 is not found with sbt in scala-2.12
> 
>
> Key: SPARK-25041
> URL: https://issues.apache.org/jira/browse/SPARK-25041
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> When the master is build with sbt in scala-2.12, the following error occurs:
> {code}
> [warn]module not found: 
> com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10
> [warn]  public: tried
> [warn]   
> https://repo1.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom
> [warn]  Maven2 Local: tried
> [warn]   
> file:/gsa/jpngsa/home/i/s/ishizaki/.m2/repository/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom
> [warn]  local: tried
> [warn]   
> /gsa/jpngsa/home/i/s/ishizaki/.ivy2/local/com.typesafe.genjavadoc/genjavadoc-plugin_2.12.6/0.10/ivys/ivy.xml
> [info] Resolving jline#jline;2.14.3 ...
> [warn]::
> [warn]::  UNRESOLVED DEPENDENCIES ::
> [warn]::
> [warn]:: com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not 
> found
> [warn]::
> [warn] 
> [warn]Note: Unresolved dependencies path:
> [warn]com.typesafe.genjavadoc:genjavadoc-plugin_2.12.6:0.10 
> (/home/ishizaki/Spark/PR/scala212/spark/project/SparkBuild.scala#L118)
> [warn]  +- org.apache.spark:spark-tags_2.12:2.4.0-SNAPSHOT
> sbt.ResolveException: unresolved dependency: 
> com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not found
>   at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:320)
>   at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
>   at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
>   at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>   at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>   at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
>   at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
>   at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
>   at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
>   at 
> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
>   at 
> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
>   at xsbt.boot.Using$.withResource(Using.scala:10)
>   at xsbt.boot.Using$.apply(Using.scala:9)
>   at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
>   at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
>   at xsbt.boot.Locks$.apply0(Locks.scala:31)
>   at xsbt.boot.Locks$.apply(Locks.scala:28)
>   at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
>   at sbt.IvySbt.withIvy(Ivy.scala:128)
>   at sbt.IvySbt.withIvy(Ivy.scala:125)
>   at sbt.IvySbt$Module.withModule(Ivy.scala:156)
>   at sbt.IvyActions$.updateEither(IvyActions.scala:168)
>   at 
> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1555)
>   at 
> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1551)
>   at 
> sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1586)
>   at 
> sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1584)
>   at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:37)
>   at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1589)
>   at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1583)
>   at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:60)
>   at sbt.Classpaths$.cachedUpdate(Defaults.scala:1606)
>   at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1533)
>   at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1485)
>   at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
>   at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
>   at sbt.std.Transform$$anon$4.work(System.scala:63)
>   at 
> sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
>   at 
> sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
>   at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
>   at sbt.Execute.work(Execute.scala:237)
>   at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228)
>   at 

[jira] [Assigned] (SPARK-25041) genjavadoc-plugin_2.12.6 is not found with sbt in scala-2.12

2018-08-07 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25041:


Assignee: Apache Spark

> genjavadoc-plugin_2.12.6 is not found with sbt in scala-2.12
> 
>
> Key: SPARK-25041
> URL: https://issues.apache.org/jira/browse/SPARK-25041
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>Priority: Major
>
> When the master is build with sbt in scala-2.12, the following error occurs:
> {code}
> [warn]module not found: 
> com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10
> [warn]  public: tried
> [warn]   
> https://repo1.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom
> [warn]  Maven2 Local: tried
> [warn]   
> file:/gsa/jpngsa/home/i/s/ishizaki/.m2/repository/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom
> [warn]  local: tried
> [warn]   
> /gsa/jpngsa/home/i/s/ishizaki/.ivy2/local/com.typesafe.genjavadoc/genjavadoc-plugin_2.12.6/0.10/ivys/ivy.xml
> [info] Resolving jline#jline;2.14.3 ...
> [warn]::
> [warn]::  UNRESOLVED DEPENDENCIES ::
> [warn]::
> [warn]:: com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not 
> found
> [warn]::
> [warn] 
> [warn]Note: Unresolved dependencies path:
> [warn]com.typesafe.genjavadoc:genjavadoc-plugin_2.12.6:0.10 
> (/home/ishizaki/Spark/PR/scala212/spark/project/SparkBuild.scala#L118)
> [warn]  +- org.apache.spark:spark-tags_2.12:2.4.0-SNAPSHOT
> sbt.ResolveException: unresolved dependency: 
> com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not found
>   at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:320)
>   at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
>   at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
>   at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>   at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>   at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
>   at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
>   at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
>   at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
>   at 
> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
>   at 
> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
>   at xsbt.boot.Using$.withResource(Using.scala:10)
>   at xsbt.boot.Using$.apply(Using.scala:9)
>   at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
>   at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
>   at xsbt.boot.Locks$.apply0(Locks.scala:31)
>   at xsbt.boot.Locks$.apply(Locks.scala:28)
>   at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
>   at sbt.IvySbt.withIvy(Ivy.scala:128)
>   at sbt.IvySbt.withIvy(Ivy.scala:125)
>   at sbt.IvySbt$Module.withModule(Ivy.scala:156)
>   at sbt.IvyActions$.updateEither(IvyActions.scala:168)
>   at 
> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1555)
>   at 
> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1551)
>   at 
> sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1586)
>   at 
> sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1584)
>   at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:37)
>   at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1589)
>   at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1583)
>   at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:60)
>   at sbt.Classpaths$.cachedUpdate(Defaults.scala:1606)
>   at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1533)
>   at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1485)
>   at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
>   at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
>   at sbt.std.Transform$$anon$4.work(System.scala:63)
>   at 
> sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
>   at 
> sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
>   at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
>   at sbt.Execute.work(Execute.scala:237)
>   at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228)
>   at 

[jira] [Commented] (SPARK-25041) genjavadoc-plugin_2.12.6 is not found with sbt in scala-2.12

2018-08-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571240#comment-16571240
 ] 

Apache Spark commented on SPARK-25041:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/22020

> genjavadoc-plugin_2.12.6 is not found with sbt in scala-2.12
> 
>
> Key: SPARK-25041
> URL: https://issues.apache.org/jira/browse/SPARK-25041
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> When the master is build with sbt in scala-2.12, the following error occurs:
> {code}
> [warn]module not found: 
> com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10
> [warn]  public: tried
> [warn]   
> https://repo1.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom
> [warn]  Maven2 Local: tried
> [warn]   
> file:/gsa/jpngsa/home/i/s/ishizaki/.m2/repository/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.6/0.10/genjavadoc-plugin_2.12.6-0.10.pom
> [warn]  local: tried
> [warn]   
> /gsa/jpngsa/home/i/s/ishizaki/.ivy2/local/com.typesafe.genjavadoc/genjavadoc-plugin_2.12.6/0.10/ivys/ivy.xml
> [info] Resolving jline#jline;2.14.3 ...
> [warn]::
> [warn]::  UNRESOLVED DEPENDENCIES ::
> [warn]::
> [warn]:: com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not 
> found
> [warn]::
> [warn] 
> [warn]Note: Unresolved dependencies path:
> [warn]com.typesafe.genjavadoc:genjavadoc-plugin_2.12.6:0.10 
> (/home/ishizaki/Spark/PR/scala212/spark/project/SparkBuild.scala#L118)
> [warn]  +- org.apache.spark:spark-tags_2.12:2.4.0-SNAPSHOT
> sbt.ResolveException: unresolved dependency: 
> com.typesafe.genjavadoc#genjavadoc-plugin_2.12.6;0.10: not found
>   at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:320)
>   at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
>   at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
>   at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>   at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>   at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
>   at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
>   at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
>   at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
>   at 
> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
>   at 
> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
>   at xsbt.boot.Using$.withResource(Using.scala:10)
>   at xsbt.boot.Using$.apply(Using.scala:9)
>   at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
>   at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
>   at xsbt.boot.Locks$.apply0(Locks.scala:31)
>   at xsbt.boot.Locks$.apply(Locks.scala:28)
>   at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
>   at sbt.IvySbt.withIvy(Ivy.scala:128)
>   at sbt.IvySbt.withIvy(Ivy.scala:125)
>   at sbt.IvySbt$Module.withModule(Ivy.scala:156)
>   at sbt.IvyActions$.updateEither(IvyActions.scala:168)
>   at 
> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1555)
>   at 
> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1551)
>   at 
> sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1586)
>   at 
> sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$122.apply(Defaults.scala:1584)
>   at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:37)
>   at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1589)
>   at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1583)
>   at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:60)
>   at sbt.Classpaths$.cachedUpdate(Defaults.scala:1606)
>   at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1533)
>   at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1485)
>   at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
>   at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
>   at sbt.std.Transform$$anon$4.work(System.scala:63)
>   at 
> sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
>   at 
> sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
>   at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
>   at sbt.Execute.work(Execute.scala:237)
>   at 

[jira] [Resolved] (SPARK-24341) Codegen compile error from predicate subquery

2018-08-07 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-24341.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21403
[https://github.com/apache/spark/pull/21403]

> Codegen compile error from predicate subquery
> -
>
> Key: SPARK-24341
> URL: https://issues.apache.org/jira/browse/SPARK-24341
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Juliusz Sompolski
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 2.4.0
>
>
> Ran on master:
> {code}
> drop table if exists juleka;
> drop table if exists julekb;
> create table juleka (a integer, b integer);
> create table julekb (na integer, nb integer);
> insert into juleka values (1,1);
> insert into julekb values (1,1);
> select * from juleka where (a, b) not in (select (na, nb) from julekb);
> {code}
> Results in:
> {code}
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 27, Column 29: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 27, Column 29: Cannot compare types "int" and 
> "org.apache.spark.sql.catalyst.InternalRow"
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>   at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2344)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2316)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3936)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4806)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1415)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:92)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.generate(GeneratePredicate.scala:46)
>   at 
> org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:380)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$lzycompute(BroadcastNestedLoopJoinExec.scala:99)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition(BroadcastNestedLoopJoinExec.scala:97)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2$$anonfun$apply$3.apply(BroadcastNestedLoopJoinExec.scala:203)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2$$anonfun$apply$3.apply(BroadcastNestedLoopJoinExec.scala:203)
>   at 
> scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38)
>   at 
> scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:46)
>   at scala.collection.mutable.ArrayOps$ofRef.exists(ArrayOps.scala:186)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2.apply(BroadcastNestedLoopJoinExec.scala:203)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2.apply(BroadcastNestedLoopJoinExec.scala:202)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:389)
>   at 
> org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:49)
>   at 
> org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:126)
>   at 
> org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:125)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:111)
>   at 

  1   2   >