[jira] [Updated] (SPARK-22357) SparkContext.binaryFiles ignore minPartitions parameter

2018-09-02 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-22357:

Labels: behavior-changes  (was: )

> SparkContext.binaryFiles ignore minPartitions parameter
> ---
>
> Key: SPARK-22357
> URL: https://issues.apache.org/jira/browse/SPARK-22357
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.2, 2.2.0
>Reporter: Weichen Xu
>Assignee: Bo Meng
>Priority: Major
>  Labels: behavior-changes
> Fix For: 2.4.0
>
>
> this is a bug in binaryFiles - even though we give it the partitions, 
> binaryFiles ignores it.
> This is a bug introduced in spark 2.1 from spark 2.0, in file 
> PortableDataStream.scala the argument “minPartitions” is no longer used (with 
> the push to master on 11/7/6):
> {code}
> /**
> Allow minPartitions set by end-user in order to keep compatibility with old 
> Hadoop API
> which is set through setMaxSplitSize
> */
> def setMinPartitions(sc: SparkContext, context: JobContext, minPartitions: 
> Int) {
> val defaultMaxSplitBytes = 
> sc.getConf.get(config.FILES_MAX_PARTITION_BYTES)
> val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES)
> val defaultParallelism = sc.defaultParallelism
> val files = listStatus(context).asScala
> val totalBytes = files.filterNot(.isDirectory).map(.getLen + 
> openCostInBytes).sum
> val bytesPerCore = totalBytes / defaultParallelism
> val maxSplitSize = Math.min(defaultMaxSplitBytes, 
> Math.max(openCostInBytes, bytesPerCore))
> super.setMaxSplitSize(maxSplitSize)
> }
> {code}
> The code previously, in version 2.0, was:
> {code}
> def setMinPartitions(context: JobContext, minPartitions: Int) {
> val totalLen = 
> listStatus(context).asScala.filterNot(.isDirectory).map(.getLen).sum
> val maxSplitSize = math.ceil(totalLen / math.max(minPartitions, 
> 1.0)).toLong
> super.setMaxSplitSize(maxSplitSize)
> }
> {code}
> The new code is very smart, but it ignores what the user passes in and uses 
> the data size, which is kind of a breaking change in some sense
> In our specific case this was a problem, because we initially read in just 
> the files names and only after that the dataframe becomes very large, when 
> reading in the images themselves – and in this case the new code does not 
> handle the partitioning very well.
> I’m not sure if it can be easily fixed because I don’t understand the full 
> context of the change in spark (but at the very least the unused parameter 
> should be removed to avoid confusion).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25312) Add description for the conf spark.network.crypto.keyFactoryIterations

2018-09-02 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-25312:

Labels: starter  (was: )

> Add description for the conf spark.network.crypto.keyFactoryIterations
> --
>
> Key: SPARK-25312
> URL: https://issues.apache.org/jira/browse/SPARK-25312
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Spark Core
>Affects Versions: 2.3.2
>Reporter: Xiao Li
>Priority: Major
>  Labels: starter
>
> https://github.com/apache/spark/pull/22195 fixed the typo of an undocumented 
> conf `spark.network.crypto.keyFactoryIterations`. We should document it like 
> what we did for the other confs spark.network.crypto.xyz in 
> https://spark.apache.org/docs/latest/configuration.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25312) Add description for the conf spark.network.crypto.keyFactoryIterations

2018-09-02 Thread Xiao Li (JIRA)
Xiao Li created SPARK-25312:
---

 Summary: Add description for the conf 
spark.network.crypto.keyFactoryIterations
 Key: SPARK-25312
 URL: https://issues.apache.org/jira/browse/SPARK-25312
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, Spark Core
Affects Versions: 2.3.2
Reporter: Xiao Li


https://github.com/apache/spark/pull/22195 fixed the typo of an undocumented 
conf `spark.network.crypto.keyFactoryIterations`. We should document it like 
what we did for the other confs spark.network.crypto.xyz in 
https://spark.apache.org/docs/latest/configuration.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-02 Thread Peter Toth (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601786#comment-16601786
 ] 

Peter Toth commented on SPARK-25150:


[~EeveeB], sorry, I have just noticed that you might have started working on a 
patch. I think I came to the same conclusion as you and submitted a PR, but I'm 
quite new to Spark so any comments are welcome.

> Joining DataFrames derived from the same source yields confusing/incorrect 
> results
> --
>
> Key: SPARK-25150
> URL: https://issues.apache.org/jira/browse/SPARK-25150
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Nicholas Chammas
>Priority: Major
> Attachments: output-with-implicit-cross-join.txt, 
> output-without-implicit-cross-join.txt, persons.csv, states.csv, 
> zombie-analysis.py
>
>
> I have two DataFrames, A and B. From B, I have derived two additional 
> DataFrames, B1 and B2. When joining A to B1 and B2, I'm getting a very 
> confusing error:
> {code:java}
> Join condition is missing or trivial.
> Either: use the CROSS JOIN syntax to allow cartesian products between these
> relations, or: enable implicit cartesian products by setting the configuration
> variable spark.sql.crossJoin.enabled=true;
> {code}
> Then, when I configure "spark.sql.crossJoin.enabled=true" as instructed, 
> Spark appears to give me incorrect answers.
> I am not sure if I am missing something obvious, or if there is some kind of 
> bug here. The "join condition is missing" error is confusing and doesn't make 
> sense to me, and the seemingly incorrect output is concerning.
> I've attached a reproduction, along with the output I'm seeing with and 
> without the implicit cross join enabled.
> I realize the join I've written is not correct in the sense that it should be 
> left outer join instead of an inner join (since some of the aggregates are 
> not available for all states), but that doesn't explain Spark's behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12

2018-09-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601777#comment-16601777
 ] 

Apache Spark commented on SPARK-25044:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/22319

> Address translation of LMF closure primitive args to Object in Scala 2.12
> -
>
> Key: SPARK-25044
> URL: https://issues.apache.org/jira/browse/SPARK-25044
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Major
> Fix For: 2.4.0
>
>
> A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 
> Fix HandleNullInputsForUDF rule":
> {code:java}
> - SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED ***
> Results do not match for query:
> ...
> == Results ==
> == Results ==
> !== Correct Answer - 3 == == Spark Answer - 3 ==
> !struct<> struct
> ![0,10,null] [0,10,0]
> ![1,12,null] [1,12,1]
> ![2,14,null] [2,14,2] (QueryTest.scala:163){code}
> You can kind of get what's going on reading the test:
> {code:java}
> test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
> // assume(!ClosureCleanerSuite2.supportsLMFs)
> // This test won't test what it intends to in 2.12, as lambda metafactory 
> closures
> // have arg types that are not primitive, but Object
> val udf1 = udf({(x: Int, y: Int) => x + y})
> val df = spark.range(0, 3).toDF("a")
> .withColumn("b", udf1($"a", udf1($"a", lit(10
> .withColumn("c", udf1($"a", lit(null)))
> val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed
> comparePlans(df.logicalPlan, plan)
> checkAnswer(
> df,
> Seq(
> Row(0, 10, null),
> Row(1, 12, null),
> Row(2, 14, null)))
> }{code}
>  
> It seems that the closure that is fed in as a UDF changes behavior, in a way 
> that primitive-type arguments are handled differently. For example an Int 
> argument, when fed 'null', acts like 0.
> I'm sure it's a difference in the LMF closure and how its types are 
> understood, but not exactly sure of the cause yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25301) When a view uses an UDF from a non default database, Spark analyser throws AnalysisException

2018-09-02 Thread Vinod KC (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-25301:
-
Description: 
When a hive view uses an UDF from a non default database, Spark analyser throws 
AnalysisException

Steps to simulate this issue
 -
 In Hive
 
 1) CREATE DATABASE d100;
 2) create function d100.udf100 as 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper'; // Note: udf100 is 
created in d100
 3) create view d100.v100 as select *d100.udf100*(name)  from default.emp; // 
Note : table default.emp has two columns 'name', 'address', 
 5) select * from d100.v100; // query on view d100.v100 gives correct result

In Spark
 -
 1) spark.sql("select * from d100.v100").show
 throws 
 ```
 org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. 
This function is neither a registered temporary function nor a permanent 
function registered in the database '*default*'
 ```

This is because, while parsing the SQL statement of the View 'select 
`d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to split 
database name and udf name and hence Spark function registry tries to load the 
UDF 'd100.udf100' from 'default' database.

  was:
When a hive view uses an UDF from a non default database, Spark analyser throws 
AnalysisException

Steps to simulate this issue
 -
 In Hive
 
 1) CREATE DATABASE d100;
 2) ADD JAR /usr/udf/masking.jar // masking.jar has a custom udf class 
'com.uzx.udf.Masking'
 3) create function d100.udf100 as "com.uzx.udf.Masking"; // Note: udf100 is 
created in d100
 4) create view d100.v100 as select *d100.udf100*(name)  from default.emp; // 
Note : table default.emp has two columns 'nanme', 'address', 
 5) select * from d100.v100; // query on view d100.v100 gives correct result

In Spark
 -
 1) spark.sql("select * from d100.v100").show
 throws 
 ```
 org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. 
This function is neither a registered temporary function nor a permanent 
function registered in the database '*default*'
 ```

This is because, while parsing the SQL statement of the View 'select 
`d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to split 
database name and udf name and hence Spark function registry tries to load the 
UDF 'd100.udf100' from 'default' database.


> When a view uses an UDF from a non default database, Spark analyser throws 
> AnalysisException
> 
>
> Key: SPARK-25301
> URL: https://issues.apache.org/jira/browse/SPARK-25301
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Vinod KC
>Priority: Minor
>
> When a hive view uses an UDF from a non default database, Spark analyser 
> throws AnalysisException
> Steps to simulate this issue
>  -
>  In Hive
>  
>  1) CREATE DATABASE d100;
>  2) create function d100.udf100 as 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper'; // Note: udf100 is 
> created in d100
>  3) create view d100.v100 as select *d100.udf100*(name)  from default.emp; // 
> Note : table default.emp has two columns 'name', 'address', 
>  5) select * from d100.v100; // query on view d100.v100 gives correct result
> In Spark
>  -
>  1) spark.sql("select * from d100.v100").show
>  throws 
>  ```
>  org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. 
> This function is neither a registered temporary function nor a permanent 
> function registered in the database '*default*'
>  ```
> This is because, while parsing the SQL statement of the View 'select 
> `d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to 
> split database name and udf name and hence Spark function registry tries to 
> load the UDF 'd100.udf100' from 'default' database.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25311) `SPARK_LOCAL_HOSTNAME` unsupport IPV6 when do host checking

2018-09-02 Thread Xiaochen Ouyang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601751#comment-16601751
 ] 

Xiaochen Ouyang commented on SPARK-25311:
-

IPV6, IPV4 regular expression can be used to solve this problem, but 
`checkHost` is revoked more frequently, and there is a certain loss in 
performance. I don't know if anyone else has a recommended solution?

> `SPARK_LOCAL_HOSTNAME` unsupport IPV6 when do host checking
> ---
>
> Key: SPARK-25311
> URL: https://issues.apache.org/jira/browse/SPARK-25311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1, 2.2.2
>Reporter: Xiaochen Ouyang
>Priority: Major
>
> IPV4 can pass the flowing check
> {code:java}
>   def checkHost(host: String, message: String = "") {
> assert(host.indexOf(':') == -1, message)
>   }
> {code}
> But IPV6 check failed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25311) `SPARK_LOCAL_HOSTNAME` unsupport IPV6 when do host checking

2018-09-02 Thread Xiaochen Ouyang (JIRA)
Xiaochen Ouyang created SPARK-25311:
---

 Summary: `SPARK_LOCAL_HOSTNAME` unsupport IPV6 when do host 
checking
 Key: SPARK-25311
 URL: https://issues.apache.org/jira/browse/SPARK-25311
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.2, 2.2.1
Reporter: Xiaochen Ouyang


IPV4 can pass the flowing check
{code:java}
  def checkHost(host: String, message: String = "") {
assert(host.indexOf(':') == -1, message)
  }
{code}
But IPV6 check failed.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25304) enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12

2018-09-02 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-25304:
-

Assignee: Darcy Shen

> enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12
> --
>
> Key: SPARK-25304
> URL: https://issues.apache.org/jira/browse/SPARK-25304
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Assignee: Darcy Shen
>Priority: Minor
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25304) enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12

2018-09-02 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-25304.
---
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 22308
[https://github.com/apache/spark/pull/22308]

> enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12
> --
>
> Key: SPARK-25304
> URL: https://issues.apache.org/jira/browse/SPARK-25304
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Assignee: Darcy Shen
>Priority: Minor
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25304) enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12

2018-09-02 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-25304:
--
Priority: Minor  (was: Major)

> enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12
> --
>
> Key: SPARK-25304
> URL: https://issues.apache.org/jira/browse/SPARK-25304
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Priority: Minor
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one

2018-09-02 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601747#comment-16601747
 ] 

Wenchen Fan commented on SPARK-23253:
-

Hi [~irashid] thanks for providing these references and sorry for the false 
alert! I was too anxious when searching the commit history and mistakenly got 
to this ticket. You are right, https://github.com/apache/spark/pull/9610 is the 
one that needs to revert(partially) to make my test pass.

According to the discussion in https://github.com/apache/spark/pull/9214 , 
seems we've already known the problem of non-dererministic output, but decided 
to leave it and stick with "first write wins", as it's too hard to fix. I think 
https://github.com/apache/spark/pull/6648 is the right fix.

Since it's not possible to finish https://github.com/apache/spark/pull/6648 
before Spark 2.4, I'll refer it in the code comment and just fail the job if 
non-deterministic shuffle writing is detected. In the next release, I can help 
with https://github.com/apache/spark/pull/6648 to really fix the repartition 
bug. Thanks!

> Only write shuffle temporary index file when there is not an existing one
> -
>
> Key: SPARK-23253
> URL: https://issues.apache.org/jira/browse/SPARK-23253
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 2.2.1
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 2.4.0
>
>
> Shuffle Index temporay file is used for atomic creating shuffle index file, 
> it is not needed when the index file already exists after another attempts of 
> same task had it done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25150:


Assignee: (was: Apache Spark)

> Joining DataFrames derived from the same source yields confusing/incorrect 
> results
> --
>
> Key: SPARK-25150
> URL: https://issues.apache.org/jira/browse/SPARK-25150
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Nicholas Chammas
>Priority: Major
> Attachments: output-with-implicit-cross-join.txt, 
> output-without-implicit-cross-join.txt, persons.csv, states.csv, 
> zombie-analysis.py
>
>
> I have two DataFrames, A and B. From B, I have derived two additional 
> DataFrames, B1 and B2. When joining A to B1 and B2, I'm getting a very 
> confusing error:
> {code:java}
> Join condition is missing or trivial.
> Either: use the CROSS JOIN syntax to allow cartesian products between these
> relations, or: enable implicit cartesian products by setting the configuration
> variable spark.sql.crossJoin.enabled=true;
> {code}
> Then, when I configure "spark.sql.crossJoin.enabled=true" as instructed, 
> Spark appears to give me incorrect answers.
> I am not sure if I am missing something obvious, or if there is some kind of 
> bug here. The "join condition is missing" error is confusing and doesn't make 
> sense to me, and the seemingly incorrect output is concerning.
> I've attached a reproduction, along with the output I'm seeing with and 
> without the implicit cross join enabled.
> I realize the join I've written is not correct in the sense that it should be 
> left outer join instead of an inner join (since some of the aggregates are 
> not available for all states), but that doesn't explain Spark's behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601734#comment-16601734
 ] 

Apache Spark commented on SPARK-25150:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/22318

> Joining DataFrames derived from the same source yields confusing/incorrect 
> results
> --
>
> Key: SPARK-25150
> URL: https://issues.apache.org/jira/browse/SPARK-25150
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Nicholas Chammas
>Priority: Major
> Attachments: output-with-implicit-cross-join.txt, 
> output-without-implicit-cross-join.txt, persons.csv, states.csv, 
> zombie-analysis.py
>
>
> I have two DataFrames, A and B. From B, I have derived two additional 
> DataFrames, B1 and B2. When joining A to B1 and B2, I'm getting a very 
> confusing error:
> {code:java}
> Join condition is missing or trivial.
> Either: use the CROSS JOIN syntax to allow cartesian products between these
> relations, or: enable implicit cartesian products by setting the configuration
> variable spark.sql.crossJoin.enabled=true;
> {code}
> Then, when I configure "spark.sql.crossJoin.enabled=true" as instructed, 
> Spark appears to give me incorrect answers.
> I am not sure if I am missing something obvious, or if there is some kind of 
> bug here. The "join condition is missing" error is confusing and doesn't make 
> sense to me, and the seemingly incorrect output is concerning.
> I've attached a reproduction, along with the output I'm seeing with and 
> without the implicit cross join enabled.
> I realize the join I've written is not correct in the sense that it should be 
> left outer join instead of an inner join (since some of the aggregates are 
> not available for all states), but that doesn't explain Spark's behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25150:


Assignee: Apache Spark

> Joining DataFrames derived from the same source yields confusing/incorrect 
> results
> --
>
> Key: SPARK-25150
> URL: https://issues.apache.org/jira/browse/SPARK-25150
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Nicholas Chammas
>Assignee: Apache Spark
>Priority: Major
> Attachments: output-with-implicit-cross-join.txt, 
> output-without-implicit-cross-join.txt, persons.csv, states.csv, 
> zombie-analysis.py
>
>
> I have two DataFrames, A and B. From B, I have derived two additional 
> DataFrames, B1 and B2. When joining A to B1 and B2, I'm getting a very 
> confusing error:
> {code:java}
> Join condition is missing or trivial.
> Either: use the CROSS JOIN syntax to allow cartesian products between these
> relations, or: enable implicit cartesian products by setting the configuration
> variable spark.sql.crossJoin.enabled=true;
> {code}
> Then, when I configure "spark.sql.crossJoin.enabled=true" as instructed, 
> Spark appears to give me incorrect answers.
> I am not sure if I am missing something obvious, or if there is some kind of 
> bug here. The "join condition is missing" error is confusing and doesn't make 
> sense to me, and the seemingly incorrect output is concerning.
> I've attached a reproduction, along with the output I'm seeing with and 
> without the implicit cross join enabled.
> I realize the join I've written is not correct in the sense that it should be 
> left outer join instead of an inner join (since some of the aggregates are 
> not available for all states), but that doesn't explain Spark's behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25135) insert datasource table may all null when select from view on parquet

2018-09-02 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601728#comment-16601728
 ] 

Yuming Wang commented on SPARK-25135:
-

[https://github.com/apache/spark/pull/22311]

[https://github.com/apache/spark/pull/22287]

We are trying to fix it.

 

> insert datasource table may all null when select from view on parquet
> -
>
> Key: SPARK-25135
> URL: https://issues.apache.org/jira/browse/SPARK-25135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Yuming Wang
>Priority: Blocker
>  Labels: Parquet, correctness
>
> This happens on parquet.
> How to reproduce in parquet.
> {code:scala}
> val path = "/tmp/spark/parquet"
> val cnt = 30
> spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as bigint) 
> as col2").write.mode("overwrite").parquet(path)
> spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using parquet 
> location '$path'")
> spark.sql("create view view1 as select col1, col2 from table1 where col1 > 
> -20")
> spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using parquet")
> spark.sql("insert overwrite table table2 select COL1, COL2 from view1")
> spark.table("table2").show
> {code}
> FYI, the following is orc.
> {code}
> scala> val path = "/tmp/spark/orc"
> scala> val cnt = 30
> scala> spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as 
> bigint) as col2").write.mode("overwrite").orc(path)
> scala> spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using orc 
> location '$path'")
> scala> spark.sql("create view view1 as select col1, col2 from table1 where 
> col1 > -20")
> scala> spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using orc")
> scala> spark.sql("insert overwrite table table2 select COL1, COL2 from view1")
> scala> spark.table("table2").show
> +++
> |COL1|COL2|
> +++
> |  15|  15|
> |  16|  16|
> |  17|  17|
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25265) Fix memory leak in Barrier Execution Mode

2018-09-02 Thread Kousuke Saruta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-25265:
---
Summary: Fix memory leak in Barrier Execution Mode  (was: Fix memory leak 
vulnerability in Barrier Execution Mode)

> Fix memory leak in Barrier Execution Mode
> -
>
> Key: SPARK-25265
> URL: https://issues.apache.org/jira/browse/SPARK-25265
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 2.4.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Critical
>
> BarrierCoordinator$ uses Timer and TimerTask. `TimerTask#cancel()` is invoked 
> in ContextBarrierState#cancelTimerTask but `Timer#purge()` is never invoked.
> Once a TimerTask is scheduled, the reference to it is not released until 
> `Timer#purge()` is invoked even though `TimerTask#cancel()` is invoked.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25265) Fix memory leak vulnerability in Barrier Execution Mode

2018-09-02 Thread Kousuke Saruta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-25265.

Resolution: Duplicate

Thanks for the notification. It may be accidentally duplicated. I'll close this 
one.

> Fix memory leak vulnerability in Barrier Execution Mode
> ---
>
> Key: SPARK-25265
> URL: https://issues.apache.org/jira/browse/SPARK-25265
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 2.4.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Critical
>
> BarrierCoordinator$ uses Timer and TimerTask. `TimerTask#cancel()` is invoked 
> in ContextBarrierState#cancelTimerTask but `Timer#purge()` is never invoked.
> Once a TimerTask is scheduled, the reference to it is not released until 
> `Timer#purge()` is invoked even though `TimerTask#cancel()` is invoked.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25176) Kryo fails to serialize a parametrised type hierarchy

2018-09-02 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601643#comment-16601643
 ] 

Dongjoon Hyun commented on SPARK-25176:
---

Sorry, [~m.pryahin]. I overlooked the example in [4]. I deleted my previous 
comment.

> Kryo fails to serialize a parametrised type hierarchy
> -
>
> Key: SPARK-25176
> URL: https://issues.apache.org/jira/browse/SPARK-25176
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.2, 2.3.1
>Reporter: Mikhail Pryakhin
>Priority: Major
>
> I'm using the latest spark version spark-core_2.11:2.3.1 which 
> transitively depends on com.esotericsoftware:kryo-shaded:3.0.3 via the
> com.twitter:chill_2.11:0.8.0 dependency. This exact version of kryo 
> serializer contains an issue [1,2] which results in throwing 
> ClassCastExceptions when serialising parameterised type hierarchy.
> This issue has been fixed in kryo version 4.0.0 [3]. It would be great to 
> have this update in Spark as well. Could you please upgrade the version of 
> com.twitter:chill_2.11 dependency from 0.8.0 up to 0.9.2?
> You can find a simple test to reproduce the issue [4].
> [1] https://github.com/EsotericSoftware/kryo/issues/384
> [2] https://github.com/EsotericSoftware/kryo/issues/377
> [3] https://github.com/EsotericSoftware/kryo/releases/tag/kryo-parent-4.0.0
> [4] https://github.com/mpryahin/kryo-parametrized-type-inheritance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-25176) Kryo fails to serialize a parametrised type hierarchy

2018-09-02 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25176:
--
Comment: was deleted

(was: [~m.pryahin]. There is not much information for this. Since this is 
general suggestion for upgrade, let's close this as duplicate of SPARK-23131.
SPARK-23131 has a PR for you.)

> Kryo fails to serialize a parametrised type hierarchy
> -
>
> Key: SPARK-25176
> URL: https://issues.apache.org/jira/browse/SPARK-25176
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.2, 2.3.1
>Reporter: Mikhail Pryakhin
>Priority: Major
>
> I'm using the latest spark version spark-core_2.11:2.3.1 which 
> transitively depends on com.esotericsoftware:kryo-shaded:3.0.3 via the
> com.twitter:chill_2.11:0.8.0 dependency. This exact version of kryo 
> serializer contains an issue [1,2] which results in throwing 
> ClassCastExceptions when serialising parameterised type hierarchy.
> This issue has been fixed in kryo version 4.0.0 [3]. It would be great to 
> have this update in Spark as well. Could you please upgrade the version of 
> com.twitter:chill_2.11 dependency from 0.8.0 up to 0.9.2?
> You can find a simple test to reproduce the issue [4].
> [1] https://github.com/EsotericSoftware/kryo/issues/384
> [2] https://github.com/EsotericSoftware/kryo/issues/377
> [3] https://github.com/EsotericSoftware/kryo/releases/tag/kryo-parent-4.0.0
> [4] https://github.com/mpryahin/kryo-parametrized-type-inheritance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25176) Kryo fails to serialize a parametrised type hierarchy

2018-09-02 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601642#comment-16601642
 ] 

Dongjoon Hyun commented on SPARK-25176:
---

[~m.pryahin]. There is not much information for this. Since this is general 
suggestion for upgrade, let's close this as duplicate of SPARK-23131.
SPARK-23131 has a PR for you.

> Kryo fails to serialize a parametrised type hierarchy
> -
>
> Key: SPARK-25176
> URL: https://issues.apache.org/jira/browse/SPARK-25176
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.2, 2.3.1
>Reporter: Mikhail Pryakhin
>Priority: Major
>
> I'm using the latest spark version spark-core_2.11:2.3.1 which 
> transitively depends on com.esotericsoftware:kryo-shaded:3.0.3 via the
> com.twitter:chill_2.11:0.8.0 dependency. This exact version of kryo 
> serializer contains an issue [1,2] which results in throwing 
> ClassCastExceptions when serialising parameterised type hierarchy.
> This issue has been fixed in kryo version 4.0.0 [3]. It would be great to 
> have this update in Spark as well. Could you please upgrade the version of 
> com.twitter:chill_2.11 dependency from 0.8.0 up to 0.9.2?
> You can find a simple test to reproduce the issue [4].
> [1] https://github.com/EsotericSoftware/kryo/issues/384
> [2] https://github.com/EsotericSoftware/kryo/issues/377
> [3] https://github.com/EsotericSoftware/kryo/releases/tag/kryo-parent-4.0.0
> [4] https://github.com/mpryahin/kryo-parametrized-type-inheritance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20389) Upgrade kryo to fix NegativeArraySizeException

2018-09-02 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601640#comment-16601640
 ] 

Dongjoon Hyun commented on SPARK-20389:
---

Hi, [~georg.kf.hei...@gmail.com] and [~tashoyan].
SPARK-23131 is trying to resolve this issue via 
https://github.com/apache/spark/pull/22179 .
Could you test the patch in your environment in order to resolve this issue 
together?

> Upgrade kryo to fix NegativeArraySizeException
> --
>
> Key: SPARK-20389
> URL: https://issues.apache.org/jira/browse/SPARK-20389
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Submit
>Affects Versions: 2.1.0, 2.2.1
> Environment: Linux, Centos7, jdk8
>Reporter: Georg Heiler
>Priority: Major
>
> I am experiencing an issue with Kryo when writing parquet files. Similar to 
> https://github.com/broadinstitute/gatk/issues/1524 a 
> NegativeArraySizeException occurs. Apparently this is fixed in a current Kryo 
> version. Spark is still using the very old 3.3 Kryo. 
> Can you please upgrade to a fixed Kryo version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25310) ArraysOverlap may throw a CompileException

2018-09-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601260#comment-16601260
 ] 

Apache Spark commented on SPARK-25310:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/22317

> ArraysOverlap may throw a CompileException
> --
>
> Key: SPARK-25310
> URL: https://issues.apache.org/jira/browse/SPARK-25310
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> Invoking {{ArraysOverlap}} function with non-nullable array type throws the 
> following error in the code generation phase.
> {code:java}
> Code generation of arrays_overlap([1,2,3], [4,5,3]) failed:
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: Expression "isNull_0" is not an rvalue
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: Expression "isNull_0" is not an rvalue
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
>   at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:143)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:48)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:32)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1260)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25310) ArraysOverlap may throw a CompileException

2018-09-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25310:


Assignee: Apache Spark

> ArraysOverlap may throw a CompileException
> --
>
> Key: SPARK-25310
> URL: https://issues.apache.org/jira/browse/SPARK-25310
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>Priority: Major
>
> Invoking {{ArraysOverlap}} function with non-nullable array type throws the 
> following error in the code generation phase.
> {code:java}
> Code generation of arrays_overlap([1,2,3], [4,5,3]) failed:
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: Expression "isNull_0" is not an rvalue
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: Expression "isNull_0" is not an rvalue
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
>   at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:143)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:48)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:32)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1260)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25310) ArraysOverlap may throw a CompileException

2018-09-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25310:


Assignee: (was: Apache Spark)

> ArraysOverlap may throw a CompileException
> --
>
> Key: SPARK-25310
> URL: https://issues.apache.org/jira/browse/SPARK-25310
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> Invoking {{ArraysOverlap}} function with non-nullable array type throws the 
> following error in the code generation phase.
> {code:java}
> Code generation of arrays_overlap([1,2,3], [4,5,3]) failed:
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: Expression "isNull_0" is not an rvalue
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: Expression "isNull_0" is not an rvalue
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
>   at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:143)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:48)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:32)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1260)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25310) ArraysOverlap may throw a CompileException

2018-09-02 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-25310:
-
Description: 
Invoking {{ArraysOverlap}} function with non-nullable array type throws the 
following error in the code generation phase.

{code:java}
Code generation of arrays_overlap([1,2,3], [4,5,3]) failed:
java.util.concurrent.ExecutionException: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 56, 
Column 11: failed to compile: org.codehaus.commons.compiler.CompileException: 
File 'generated.java', Line 56, Column 11: Expression "isNull_0" is not an 
rvalue
java.util.concurrent.ExecutionException: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 56, 
Column 11: failed to compile: org.codehaus.commons.compiler.CompileException: 
File 'generated.java', Line 56, Column 11: Expression "isNull_0" is not an 
rvalue
at 
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
at 
com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305)
at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:143)
at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:48)
at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:32)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1260)
{code}

> ArraysOverlap may throw a CompileException
> --
>
> Key: SPARK-25310
> URL: https://issues.apache.org/jira/browse/SPARK-25310
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> Invoking {{ArraysOverlap}} function with non-nullable array type throws the 
> following error in the code generation phase.
> {code:java}
> Code generation of arrays_overlap([1,2,3], [4,5,3]) failed:
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: Expression "isNull_0" is not an rvalue
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 11: Expression "isNull_0" is not an rvalue
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
>   at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305)
>   at 
> 

[jira] [Updated] (SPARK-25310) ArraysOverlap may throw a CompileException

2018-09-02 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-25310:
-
Summary: ArraysOverlap may throw a CompileException  (was: ArraysOverlap 
throws an Exception)

> ArraysOverlap may throw a CompileException
> --
>
> Key: SPARK-25310
> URL: https://issues.apache.org/jira/browse/SPARK-25310
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Kazuaki Ishizaki
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25310) ArraysOverlap throws an Exception

2018-09-02 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-25310:


 Summary: ArraysOverlap throws an Exception
 Key: SPARK-25310
 URL: https://issues.apache.org/jira/browse/SPARK-25310
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Kazuaki Ishizaki






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25309) Sci-kit Learn like Auto Pipeline Parallelization in Spark

2018-09-02 Thread Ravi (JIRA)
Ravi created SPARK-25309:


 Summary: Sci-kit Learn like Auto Pipeline Parallelization in Spark 
 Key: SPARK-25309
 URL: https://issues.apache.org/jira/browse/SPARK-25309
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 2.3.1
Reporter: Ravi


SPARK-19357 and SPARK-21911 have helped parallelize Pipelines in Spark. 
However, instead of setting the parallelism Parameter in the CrossValidator it 
would be good to have something like njobs=-1 (like Scikit Learn) where the 
Pipleline DAG could be automatically parallelized and scheduled based on the 
resources allocated to the Spark Session instead of having the user pick the 
integer value for this parameter. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25048) Pivoting by multiple columns in Scala/Java

2018-09-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601210#comment-16601210
 ] 

Apache Spark commented on SPARK-25048:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/22316

> Pivoting by multiple columns in Scala/Java
> --
>
> Key: SPARK-25048
> URL: https://issues.apache.org/jira/browse/SPARK-25048
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Maxim Gekk
>Priority: Minor
>
> Need to change or extend existing API to make pivoting by multiple columns 
> possible. Users should be able to use many columns and values like in the 
> example:
> {code:scala}
> trainingSales
>   .groupBy($"sales.year")
>   .pivot(struct(lower($"sales.course"), $"training"), Seq(
> struct(lit("dotnet"), lit("Experts")),
> struct(lit("java"), lit("Dummies")))
>   ).agg(sum($"sales.earnings"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25007) Add array_intersect / array_except /array_union / array_shuffle to SparkR

2018-09-02 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-25007.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

> Add array_intersect / array_except /array_union / array_shuffle to SparkR
> -
>
> Key: SPARK-25007
> URL: https://issues.apache.org/jira/browse/SPARK-25007
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 2.4.0
>
>
> Add R version of 
>  * array_intersect -SPARK-23913-
>  * array_except -SPARK-23915- 
>  * array_union -SPARK-23914- 
>  * array_shuffle -SPARK-23928-



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25007) Add array_intersect / array_except /array_union / array_shuffle to SparkR

2018-09-02 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-25007:


Assignee: Huaxin Gao

> Add array_intersect / array_except /array_union / array_shuffle to SparkR
> -
>
> Key: SPARK-25007
> URL: https://issues.apache.org/jira/browse/SPARK-25007
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>
> Add R version of 
>  * array_intersect -SPARK-23913-
>  * array_except -SPARK-23915- 
>  * array_union -SPARK-23914- 
>  * array_shuffle -SPARK-23928-



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org