[jira] [Commented] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161723#comment-17161723
 ] 

Apache Spark commented on SPARK-32377:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/29175

> CaseInsensitiveMap should be deterministic for addition
> ---
>
> Key: SPARK-32377
> URL: https://issues.apache.org/jira/browse/SPARK-32377
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
> var m = CaseInsensitiveMap(Map.empty[String, String])
> Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
> "5")).foreach { kv =>
>   m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
>   println(m.get("path"))
> }
> Some(1)
> Some(2)
> Some(3)
> Some(4)
> Some(1)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161724#comment-17161724
 ] 

Apache Spark commented on SPARK-32377:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/29175

> CaseInsensitiveMap should be deterministic for addition
> ---
>
> Key: SPARK-32377
> URL: https://issues.apache.org/jira/browse/SPARK-32377
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
> var m = CaseInsensitiveMap(Map.empty[String, String])
> Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
> "5")).foreach { kv =>
>   m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
>   println(m.get("path"))
> }
> Some(1)
> Some(2)
> Some(3)
> Some(4)
> Some(1)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32362) AdaptiveQueryExecSuite misses verifying AE results

2020-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-32362:


Assignee: Lantao Jin

> AdaptiveQueryExecSuite misses verifying AE results
> --
>
> Key: SPARK-32362
> URL: https://issues.apache.org/jira/browse/SPARK-32362
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Major
>
> {code}
> QueryTest.sameRows(result.toSeq, df.collect().toSeq)
> {code}
> Even the results are different, no fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32362) AdaptiveQueryExecSuite misses verifying AE results

2020-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32362.
--
Fix Version/s: 3.1.0
   3.0.1
   Resolution: Fixed

Issue resolved by pull request 29158
[https://github.com/apache/spark/pull/29158]

> AdaptiveQueryExecSuite misses verifying AE results
> --
>
> Key: SPARK-32362
> URL: https://issues.apache.org/jira/browse/SPARK-32362
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Major
> Fix For: 3.0.1, 3.1.0
>
>
> {code}
> QueryTest.sameRows(result.toSeq, df.collect().toSeq)
> {code}
> Even the results are different, no fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32365.
---
Fix Version/s: 3.1.0
   3.0.1
   Resolution: Fixed

Issue resolved by pull request 29161
[https://github.com/apache/spark/pull/29161]

> Fix java.lang.IndexOutOfBoundsException: No group -1
> 
>
> Key: SPARK-32365
> URL: https://issues.apache.org/jira/browse/SPARK-32365
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.0.1, 3.1.0
>
>
> The current implement of regexp_extract will throws a unprocessed exception 
> show below:
> SELECT regexp_extract('1a 2b 14m', 'd+' -1)
> {code:java}
> java.lang.IndexOutOfBoundsException: No group -1
> java.util.regex.Matcher.group(Matcher.java:538)
> org.apache.spark.sql.catalyst.expressions.RegExpExtract.nullSafeEval(regexpExpressions.scala:455)
> org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:704)
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52)
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-32365:
-

Assignee: jiaan.geng

> Fix java.lang.IndexOutOfBoundsException: No group -1
> 
>
> Key: SPARK-32365
> URL: https://issues.apache.org/jira/browse/SPARK-32365
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> The current implement of regexp_extract will throws a unprocessed exception 
> show below:
> SELECT regexp_extract('1a 2b 14m', 'd+' -1)
> {code:java}
> java.lang.IndexOutOfBoundsException: No group -1
> java.util.regex.Matcher.group(Matcher.java:538)
> org.apache.spark.sql.catalyst.expressions.RegExpExtract.nullSafeEval(regexpExpressions.scala:455)
> org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:704)
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52)
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-32360) Add MaxMinBy to support eliminate sorts

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-32360.
-

> Add MaxMinBy to support eliminate sorts
> ---
>
> Key: SPARK-32360
> URL: https://issues.apache.org/jira/browse/SPARK-32360
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32360) Add MaxMinBy to support eliminate sorts

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32360.
---
Resolution: Invalid

Please see the PR discussion. Since the operation is order-sensitive, we should 
not eliminate it during the optimization.

> Add MaxMinBy to support eliminate sorts
> ---
>
> Key: SPARK-32360
> URL: https://issues.apache.org/jira/browse/SPARK-32360
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32361) Remove project if output is subset of child

2020-07-20 Thread ulysses you (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-32361:

Summary: Remove project if output is subset of child  (was: Support 
collapse project with case Aggregate(Project))

> Remove project if output is subset of child
> ---
>
> Key: SPARK-32361
> URL: https://issues.apache.org/jira/browse/SPARK-32361
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32377:
--
Affects Version/s: 2.1.3

> CaseInsensitiveMap should be deterministic for addition
> ---
>
> Key: SPARK-32377
> URL: https://issues.apache.org/jira/browse/SPARK-32377
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
> var m = CaseInsensitiveMap(Map.empty[String, String])
> Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
> "5")).foreach { kv =>
>   m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
>   println(m.get("path"))
> }
> Some(1)
> Some(2)
> Some(3)
> Some(4)
> Some(1)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32377:
--
Affects Version/s: 2.2.3

> CaseInsensitiveMap should be deterministic for addition
> ---
>
> Key: SPARK-32377
> URL: https://issues.apache.org/jira/browse/SPARK-32377
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
> var m = CaseInsensitiveMap(Map.empty[String, String])
> Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
> "5")).foreach { kv =>
>   m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
>   println(m.get("path"))
> }
> Some(1)
> Some(2)
> Some(3)
> Some(4)
> Some(1)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32377:
--
Description: 
{code}
import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
var m = CaseInsensitiveMap(Map.empty[String, String])
Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
"5")).foreach { kv =>
  m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
  println(m.get("path"))
}

Some(1)
Some(2)
Some(3)
Some(4)
Some(1)
{code}

  was:
{code}
import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
var m = CaseInsensitiveMap(Map.empty[String, String])
Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
"5")).foreach { kv =>
  m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
  println(m.get("path"))
}

// Exiting paste mode, now interpreting.

Some(1)
Some(2)
Some(3)
Some(4)
Some(1)
{code}


> CaseInsensitiveMap should be deterministic for addition
> ---
>
> Key: SPARK-32377
> URL: https://issues.apache.org/jira/browse/SPARK-32377
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
> var m = CaseInsensitiveMap(Map.empty[String, String])
> Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
> "5")).foreach { kv =>
>   m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
>   println(m.get("path"))
> }
> Some(1)
> Some(2)
> Some(3)
> Some(4)
> Some(1)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32377:
--
Affects Version/s: 2.3.4

> CaseInsensitiveMap should be deterministic for addition
> ---
>
> Key: SPARK-32377
> URL: https://issues.apache.org/jira/browse/SPARK-32377
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
> var m = CaseInsensitiveMap(Map.empty[String, String])
> Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
> "5")).foreach { kv =>
>   m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
>   println(m.get("path"))
> }
> Some(1)
> Some(2)
> Some(3)
> Some(4)
> Some(1)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32377:
--
Description: 
{code}
import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
var m = CaseInsensitiveMap(Map.empty[String, String])
Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
"5")).foreach { kv =>
  m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
  println(m.get("path"))
}

// Exiting paste mode, now interpreting.

Some(1)
Some(2)
Some(3)
Some(4)
Some(1)
{code}

  was:
{code}
  test("CaseInsensitiveMap should be deterministic") {
var m = CaseInsensitiveMap(Map.empty[String, String])
Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
"5")).foreach { kv =>
  m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
  assert(m.get("path") == Some(kv._2))
}
  }
{code}


> CaseInsensitiveMap should be deterministic for addition
> ---
>
> Key: SPARK-32377
> URL: https://issues.apache.org/jira/browse/SPARK-32377
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
> var m = CaseInsensitiveMap(Map.empty[String, String])
> Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
> "5")).foreach { kv =>
>   m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
>   println(m.get("path"))
> }
> // Exiting paste mode, now interpreting.
> Some(1)
> Some(2)
> Some(3)
> Some(4)
> Some(1)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32338) Add overload for slice that accepts Columns or Int

2020-07-20 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-32338.
---
Fix Version/s: 3.1.0
 Assignee: Nikolas Vanderhoof
   Resolution: Fixed

Issue resolved by pull request 29138
https://github.com/apache/spark/pull/29138

> Add overload for slice that accepts Columns or Int
> --
>
> Key: SPARK-32338
> URL: https://issues.apache.org/jira/browse/SPARK-32338
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Nikolas Vanderhoof
>Assignee: Nikolas Vanderhoof
>Priority: Trivial
> Fix For: 3.1.0
>
>
> Add an overload for org.apache.spark.sql.functions.slice with the following 
> signature:
> {code:scala}
> def slice(x: Column, start: Any, length: Any): Column = ???
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161630#comment-17161630
 ] 

Apache Spark commented on SPARK-32377:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/29172

> CaseInsensitiveMap should be deterministic for addition
> ---
>
> Key: SPARK-32377
> URL: https://issues.apache.org/jira/browse/SPARK-32377
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
>   test("CaseInsensitiveMap should be deterministic") {
> var m = CaseInsensitiveMap(Map.empty[String, String])
> Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
> "5")).foreach { kv =>
>   m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
>   assert(m.get("path") == Some(kv._2))
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32377:


Assignee: Apache Spark

> CaseInsensitiveMap should be deterministic for addition
> ---
>
> Key: SPARK-32377
> URL: https://issues.apache.org/jira/browse/SPARK-32377
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> {code}
>   test("CaseInsensitiveMap should be deterministic") {
> var m = CaseInsensitiveMap(Map.empty[String, String])
> Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
> "5")).foreach { kv =>
>   m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
>   assert(m.get("path") == Some(kv._2))
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32377:


Assignee: (was: Apache Spark)

> CaseInsensitiveMap should be deterministic for addition
> ---
>
> Key: SPARK-32377
> URL: https://issues.apache.org/jira/browse/SPARK-32377
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
>   test("CaseInsensitiveMap should be deterministic") {
> var m = CaseInsensitiveMap(Map.empty[String, String])
> Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
> "5")).foreach { kv =>
>   m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
>   assert(m.get("path") == Some(kv._2))
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition

2020-07-20 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-32377:
-

 Summary: CaseInsensitiveMap should be deterministic for addition
 Key: SPARK-32377
 URL: https://issues.apache.org/jira/browse/SPARK-32377
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0, 2.4.6
Reporter: Dongjoon Hyun


{code}
  test("CaseInsensitiveMap should be deterministic") {
var m = CaseInsensitiveMap(Map.empty[String, String])
Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", 
"5")).foreach { kv =>
  m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]]
  assert(m.get("path") == Some(kv._2))
}
  }
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30876) Optimizer cannot infer from inferred constraints with join

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-30876:


Assignee: (was: Apache Spark)

> Optimizer cannot infer from inferred constraints with join
> --
>
> Key: SPARK-30876
> URL: https://issues.apache.org/jira/browse/SPARK-30876
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:sql}
> create table t1(a int, b int, c int);
> create table t2(a int, b int, c int);
> create table t3(a int, b int, c int);
> select count(*) from t1 join t2 join t3 on (t1.a = t2.b and t2.b = t3.c and 
> t3.c = 1);
> {code}
> Spark 2.3+:
> {noformat}
> == Physical Plan ==
> *(4) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, true, [id=#102]
>+- *(3) HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *(3) Project
>  +- *(3) BroadcastHashJoin [b#10], [c#14], Inner, BuildRight
> :- *(3) Project [b#10]
> :  +- *(3) BroadcastHashJoin [a#6], [b#10], Inner, BuildRight
> : :- *(3) Project [a#6]
> : :  +- *(3) Filter isnotnull(a#6)
> : : +- *(3) ColumnarToRow
> : :+- FileScan parquet default.t1[a#6] Batched: true, 
> DataFilters: [isnotnull(a#6)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: 
> struct
> : +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#87]
> :+- *(1) Project [b#10]
> :   +- *(1) Filter (isnotnull(b#10) AND (b#10 = 1))
> :  +- *(1) ColumnarToRow
> : +- FileScan parquet default.t2[b#10] Batched: 
> true, DataFilters: [isnotnull(b#10), (b#10 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(b), EqualTo(b,1)], 
> ReadSchema: struct
> +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#96]
>+- *(2) Project [c#14]
>   +- *(2) Filter (isnotnull(c#14) AND (c#14 = 1))
>  +- *(2) ColumnarToRow
> +- FileScan parquet default.t3[c#14] Batched: true, 
> DataFilters: [isnotnull(c#14), (c#14 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(c), EqualTo(c,1)], 
> ReadSchema: struct
> Time taken: 3.785 seconds, Fetched 1 row(s)
> {noformat}
> Spark 2.2.x:
> {noformat}
> == Physical Plan ==
> *HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition
>+- *HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *Project
>  +- *SortMergeJoin [b#19], [c#23], Inner
> :- *Project [b#19]
> :  +- *SortMergeJoin [a#15], [b#19], Inner
> : :- *Sort [a#15 ASC NULLS FIRST], false, 0
> : :  +- Exchange hashpartitioning(a#15, 200)
> : : +- *Filter (isnotnull(a#15) && (a#15 = 1))
> : :+- HiveTableScan [a#15], HiveTableRelation 
> `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#15, 
> b#16, c#17]
> : +- *Sort [b#19 ASC NULLS FIRST], false, 0
> :+- Exchange hashpartitioning(b#19, 200)
> :   +- *Filter (isnotnull(b#19) && (b#19 = 1))
> :  +- HiveTableScan [b#19], HiveTableRelation 
> `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#18, 
> b#19, c#20]
> +- *Sort [c#23 ASC NULLS FIRST], false, 0
>+- Exchange hashpartitioning(c#23, 200)
>   +- *Filter (isnotnull(c#23) && (c#23 = 1))
>  +- HiveTableScan [c#23], HiveTableRelation 
> `default`.`t3`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#21, 
> b#22, c#23]
> Time taken: 0.728 seconds, Fetched 1 row(s)
> {noformat}
> Spark 2.2 can infer {{(a#15 = 1)}}, but Spark 2.3+ can't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30876) Optimizer cannot infer from inferred constraints with join

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-30876:


Assignee: Apache Spark

> Optimizer cannot infer from inferred constraints with join
> --
>
> Key: SPARK-30876
> URL: https://issues.apache.org/jira/browse/SPARK-30876
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> How to reproduce this issue:
> {code:sql}
> create table t1(a int, b int, c int);
> create table t2(a int, b int, c int);
> create table t3(a int, b int, c int);
> select count(*) from t1 join t2 join t3 on (t1.a = t2.b and t2.b = t3.c and 
> t3.c = 1);
> {code}
> Spark 2.3+:
> {noformat}
> == Physical Plan ==
> *(4) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, true, [id=#102]
>+- *(3) HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *(3) Project
>  +- *(3) BroadcastHashJoin [b#10], [c#14], Inner, BuildRight
> :- *(3) Project [b#10]
> :  +- *(3) BroadcastHashJoin [a#6], [b#10], Inner, BuildRight
> : :- *(3) Project [a#6]
> : :  +- *(3) Filter isnotnull(a#6)
> : : +- *(3) ColumnarToRow
> : :+- FileScan parquet default.t1[a#6] Batched: true, 
> DataFilters: [isnotnull(a#6)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: 
> struct
> : +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#87]
> :+- *(1) Project [b#10]
> :   +- *(1) Filter (isnotnull(b#10) AND (b#10 = 1))
> :  +- *(1) ColumnarToRow
> : +- FileScan parquet default.t2[b#10] Batched: 
> true, DataFilters: [isnotnull(b#10), (b#10 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(b), EqualTo(b,1)], 
> ReadSchema: struct
> +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#96]
>+- *(2) Project [c#14]
>   +- *(2) Filter (isnotnull(c#14) AND (c#14 = 1))
>  +- *(2) ColumnarToRow
> +- FileScan parquet default.t3[c#14] Batched: true, 
> DataFilters: [isnotnull(c#14), (c#14 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(c), EqualTo(c,1)], 
> ReadSchema: struct
> Time taken: 3.785 seconds, Fetched 1 row(s)
> {noformat}
> Spark 2.2.x:
> {noformat}
> == Physical Plan ==
> *HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition
>+- *HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *Project
>  +- *SortMergeJoin [b#19], [c#23], Inner
> :- *Project [b#19]
> :  +- *SortMergeJoin [a#15], [b#19], Inner
> : :- *Sort [a#15 ASC NULLS FIRST], false, 0
> : :  +- Exchange hashpartitioning(a#15, 200)
> : : +- *Filter (isnotnull(a#15) && (a#15 = 1))
> : :+- HiveTableScan [a#15], HiveTableRelation 
> `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#15, 
> b#16, c#17]
> : +- *Sort [b#19 ASC NULLS FIRST], false, 0
> :+- Exchange hashpartitioning(b#19, 200)
> :   +- *Filter (isnotnull(b#19) && (b#19 = 1))
> :  +- HiveTableScan [b#19], HiveTableRelation 
> `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#18, 
> b#19, c#20]
> +- *Sort [c#23 ASC NULLS FIRST], false, 0
>+- Exchange hashpartitioning(c#23, 200)
>   +- *Filter (isnotnull(c#23) && (c#23 = 1))
>  +- HiveTableScan [c#23], HiveTableRelation 
> `default`.`t3`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#21, 
> b#22, c#23]
> Time taken: 0.728 seconds, Fetched 1 row(s)
> {noformat}
> Spark 2.2 can infer {{(a#15 = 1)}}, but Spark 2.3+ can't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30876) Optimizer cannot infer from inferred constraints with join

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161616#comment-17161616
 ] 

Apache Spark commented on SPARK-30876:
--

User 'navinvishy' has created a pull request for this issue:
https://github.com/apache/spark/pull/29170

> Optimizer cannot infer from inferred constraints with join
> --
>
> Key: SPARK-30876
> URL: https://issues.apache.org/jira/browse/SPARK-30876
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:sql}
> create table t1(a int, b int, c int);
> create table t2(a int, b int, c int);
> create table t3(a int, b int, c int);
> select count(*) from t1 join t2 join t3 on (t1.a = t2.b and t2.b = t3.c and 
> t3.c = 1);
> {code}
> Spark 2.3+:
> {noformat}
> == Physical Plan ==
> *(4) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, true, [id=#102]
>+- *(3) HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *(3) Project
>  +- *(3) BroadcastHashJoin [b#10], [c#14], Inner, BuildRight
> :- *(3) Project [b#10]
> :  +- *(3) BroadcastHashJoin [a#6], [b#10], Inner, BuildRight
> : :- *(3) Project [a#6]
> : :  +- *(3) Filter isnotnull(a#6)
> : : +- *(3) ColumnarToRow
> : :+- FileScan parquet default.t1[a#6] Batched: true, 
> DataFilters: [isnotnull(a#6)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: 
> struct
> : +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#87]
> :+- *(1) Project [b#10]
> :   +- *(1) Filter (isnotnull(b#10) AND (b#10 = 1))
> :  +- *(1) ColumnarToRow
> : +- FileScan parquet default.t2[b#10] Batched: 
> true, DataFilters: [isnotnull(b#10), (b#10 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(b), EqualTo(b,1)], 
> ReadSchema: struct
> +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#96]
>+- *(2) Project [c#14]
>   +- *(2) Filter (isnotnull(c#14) AND (c#14 = 1))
>  +- *(2) ColumnarToRow
> +- FileScan parquet default.t3[c#14] Batched: true, 
> DataFilters: [isnotnull(c#14), (c#14 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(c), EqualTo(c,1)], 
> ReadSchema: struct
> Time taken: 3.785 seconds, Fetched 1 row(s)
> {noformat}
> Spark 2.2.x:
> {noformat}
> == Physical Plan ==
> *HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition
>+- *HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *Project
>  +- *SortMergeJoin [b#19], [c#23], Inner
> :- *Project [b#19]
> :  +- *SortMergeJoin [a#15], [b#19], Inner
> : :- *Sort [a#15 ASC NULLS FIRST], false, 0
> : :  +- Exchange hashpartitioning(a#15, 200)
> : : +- *Filter (isnotnull(a#15) && (a#15 = 1))
> : :+- HiveTableScan [a#15], HiveTableRelation 
> `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#15, 
> b#16, c#17]
> : +- *Sort [b#19 ASC NULLS FIRST], false, 0
> :+- Exchange hashpartitioning(b#19, 200)
> :   +- *Filter (isnotnull(b#19) && (b#19 = 1))
> :  +- HiveTableScan [b#19], HiveTableRelation 
> `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#18, 
> b#19, c#20]
> +- *Sort [c#23 ASC NULLS FIRST], false, 0
>+- Exchange hashpartitioning(c#23, 200)
>   +- *Filter (isnotnull(c#23) && (c#23 = 1))
>  +- HiveTableScan [c#23], HiveTableRelation 
> `default`.`t3`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#21, 
> b#22, c#23]
> Time taken: 0.728 seconds, Fetched 1 row(s)
> {noformat}
> Spark 2.2 can infer {{(a#15 = 1)}}, but Spark 2.3+ can't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, 

[jira] [Updated] (SPARK-32364) `path` argument of DataFrame.load/save should override the existing options

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32364:
--
Description: 
Although we introduced CaseInsensitiveMap and CaseInsensitiveStringMap(in 
DSv2), when a user have multiple options like `path`, `paTH`, and `PATH` for 
the same key `path`, `option()/options()` are non-deterministic because 
`extraOptions` is `HashMap`. This issue aims to make load/save respect its 
direct path argument always and ignore the existing options. It's because that 
load/save function is independent from users' typos like `paTH` and is designed 
to be invoked at the last operation. So, load/save should work consistently and 
correctly always.

Please note that this doesn't aim to enforce case-insensitivity to 
`option()/options()` or `extraOptions` variable because that might be 
considered as a behavior change.

{code}
spark.read
  .option("paTh", "1")
  .option("PATH", "2")
  .option("Path", "3")
  .option("patH", "4")
  .load("5")
...
org.apache.spark.sql.AnalysisException:
Path does not exist: file:/.../1;
{code}


Since Apache Spark uses `extraOptions.toMap`, `LinkedHashMap[String, String]` 
has the same issue.
{code}
val extraOptions = new scala.collection.mutable.LinkedHashMap[String, String]
extraOptions += ("paTh" -> "1")
extraOptions += ("PATH" -> "2")
extraOptions += ("Path" -> "3")
extraOptions += ("patH" -> "4")
extraOptions += ("path" -> "5")
extraOptions.toMap

// Exiting paste mode, now interpreting.

extraOptions: scala.collection.mutable.LinkedHashMap[String,String] = Map(paTh 
-> 1, PATH -> 2, Path -> 3, patH -> 4, path -> 5)
res0: scala.collection.immutable.Map[String,String] = Map(PATH -> 2, path -> 5, 
patH -> 4, Path -> 3, paTh -> 1)
{code}

  was:
Although we introduced CaseInsensitiveMap and CaseInsensitiveStringMap(in 
DSv2), when a user have multiple options like `path`, `paTH`, and `PATH` for 
the same key `path`, `option()/options()` are non-deterministic because 
`extraOptions` is `HashMap`. This issue aims to make load/save respect its 
direct path argument always and ignore the existing options. It's because that 
load/save function is independent from users' typos like `paTH` and is designed 
to be invoked at the last operation. So, load/save should work consistently and 
correctly always.

Please note that this doesn't aim to enforce case-insensitivity to 
`option()/options()` or `extraOptions` variable because that might be 
considered as a behavior change.

{code}
spark.read
  .option("paTh", "1")
  .option("PATH", "2")
  .option("Path", "3")
  .option("patH", "4")
  .load("5")
...
org.apache.spark.sql.AnalysisException:
Path does not exist: file:/.../1;
{code}


> `path` argument of DataFrame.load/save should override the existing options
> ---
>
> Key: SPARK-32364
> URL: https://issues.apache.org/jira/browse/SPARK-32364
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Although we introduced CaseInsensitiveMap and CaseInsensitiveStringMap(in 
> DSv2), when a user have multiple options like `path`, `paTH`, and `PATH` for 
> the same key `path`, `option()/options()` are non-deterministic because 
> `extraOptions` is `HashMap`. This issue aims to make load/save respect its 
> direct path argument always and ignore the existing options. It's because 
> that load/save function is independent from users' typos like `paTH` and is 
> designed to be invoked at the last operation. So, load/save should work 
> consistently and correctly always.
> Please note that this doesn't aim to enforce case-insensitivity to 
> `option()/options()` or `extraOptions` variable because that might be 
> considered as a behavior change.
> {code}
> spark.read
>   .option("paTh", "1")
>   .option("PATH", "2")
>   .option("Path", "3")
>   .option("patH", "4")
>   .load("5")
> ...
> org.apache.spark.sql.AnalysisException:
> Path does not exist: file:/.../1;
> {code}
> Since Apache Spark uses `extraOptions.toMap`, `LinkedHashMap[String, String]` 
> has the same issue.
> {code}
> val extraOptions = new scala.collection.mutable.LinkedHashMap[String, String]
> extraOptions += ("paTh" -> "1")
> extraOptions += ("PATH" -> "2")
> extraOptions += ("Path" -> "3")
> extraOptions += ("patH" -> "4")
> extraOptions += ("path" -> "5")
> extraOptions.toMap
> // Exiting paste mode, now interpreting.
> extraOptions: scala.collection.mutable.LinkedHashMap[String,String] = 
> Map(paTh -> 1, PATH -> 2, Path -> 3, patH -> 4, path -> 5)
> res0: scala.collection.immutable.Map[String,String] = Map(PATH -> 2, path -> 
> 5, patH -> 4, Path -> 3, paTh -> 1)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SPARK-32376) Make unionByName null-filling behavior work with struct columns

2020-07-20 Thread Mukul Murthy (Jira)
Mukul Murthy created SPARK-32376:


 Summary: Make unionByName null-filling behavior work with struct 
columns
 Key: SPARK-32376
 URL: https://issues.apache.org/jira/browse/SPARK-32376
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Mukul Murthy


https://issues.apache.org/jira/browse/SPARK-29358 added support for unionByName 
to work when the two datasets didn't necessarily have the same schema, but it 
does not work with nested columns like structs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32368) Options in PartitioningAwareFileIndex should respect case insensitivity

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32368.
---
Fix Version/s: 3.1.0
   3.0.1
   Resolution: Fixed

Issue resolved by pull request 29165
[https://github.com/apache/spark/pull/29165]

> Options in PartitioningAwareFileIndex should respect case insensitivity
> ---
>
> Key: SPARK-32368
> URL: https://issues.apache.org/jira/browse/SPARK-32368
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 3.0.1, 3.1.0
>
>
> The datasource options such as {{recursiveFileLookup}} or {{pathglobFilter}} 
> currently don't respect case insensitivity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32368) Options in PartitioningAwareFileIndex should respect case insensitivity

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-32368:
-

Assignee: Hyukjin Kwon

> Options in PartitioningAwareFileIndex should respect case insensitivity
> ---
>
> Key: SPARK-32368
> URL: https://issues.apache.org/jira/browse/SPARK-32368
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>
> The datasource options such as {{recursiveFileLookup}} or {{pathglobFilter}} 
> currently don't respect case insensitivity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32367) Fix typo of parameter in KubernetesTestComponents

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32367:
--
Affects Version/s: 2.4.6

> Fix typo of parameter in KubernetesTestComponents
> -
>
> Key: SPARK-32367
> URL: https://issues.apache.org/jira/browse/SPARK-32367
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.6, 3.0.0
>Reporter: merrily01
>Assignee: merrily01
>Priority: Trivial
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> Correct the spelling of parameter 'spark.executor.instances' in 
> KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32367) Fix typo of parameter in KubernetesTestComponents

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32367:
--
Summary: Fix typo of parameter in KubernetesTestComponents  (was: Correct 
the spelling of parameter in KubernetesTestComponents)

> Fix typo of parameter in KubernetesTestComponents
> -
>
> Key: SPARK-32367
> URL: https://issues.apache.org/jira/browse/SPARK-32367
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: merrily01
>Assignee: merrily01
>Priority: Trivial
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> Correct the spelling of parameter 'spark.executor.instances' in 
> KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32367) Fix typo of parameter in KubernetesTestComponents

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32367:
--
Issue Type: Bug  (was: Improvement)

> Fix typo of parameter in KubernetesTestComponents
> -
>
> Key: SPARK-32367
> URL: https://issues.apache.org/jira/browse/SPARK-32367
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: merrily01
>Assignee: merrily01
>Priority: Trivial
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> Correct the spelling of parameter 'spark.executor.instances' in 
> KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-32367:
-

Assignee: merrily01

> Correct the spelling of parameter in KubernetesTestComponents
> -
>
> Key: SPARK-32367
> URL: https://issues.apache.org/jira/browse/SPARK-32367
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: merrily01
>Assignee: merrily01
>Priority: Trivial
> Fix For: 3.1.0
>
>
> Correct the spelling of parameter 'spark.executor.instances' in 
> KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32367.
---
Fix Version/s: 2.4.7
   3.0.1
   Resolution: Fixed

Issue resolved by pull request 29164
[https://github.com/apache/spark/pull/29164]

> Correct the spelling of parameter in KubernetesTestComponents
> -
>
> Key: SPARK-32367
> URL: https://issues.apache.org/jira/browse/SPARK-32367
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: merrily01
>Assignee: merrily01
>Priority: Trivial
> Fix For: 3.0.1, 2.4.7, 3.1.0
>
>
> Correct the spelling of parameter 'spark.executor.instances' in 
> KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32357) Investigate test result reporter integration

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32357:


Assignee: (was: Apache Spark)

> Investigate test result reporter integration
> 
>
> Key: SPARK-32357
> URL: https://issues.apache.org/jira/browse/SPARK-32357
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currently, the readability in the logs are not really good. For example, see 
> https://pipelines.actions.githubusercontent.com/gik0C3if0ep5i8iNpgFlcJRQk9UyifmoD6XvJANMVttkEP5xje/_apis/pipelines/1/runs/564/signedlogcontent/4?urlExpires=2020-07-09T14%3A05%3A52.5110439Z=HMACV1=gMGczJ8vtNPeQFE0GpjMxSS1BGq14RJLXUfjsLnaX7s%3D
> Maybe we should have a way to report the results in an easy way to read. For 
> example, Jenkins test report-like feature.
> We should maybe also take a look for 
> https://github.com/check-run-reporter/action.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32357) Investigate test result reporter integration

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32357:


Assignee: Apache Spark

> Investigate test result reporter integration
> 
>
> Key: SPARK-32357
> URL: https://issues.apache.org/jira/browse/SPARK-32357
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> Currently, the readability in the logs are not really good. For example, see 
> https://pipelines.actions.githubusercontent.com/gik0C3if0ep5i8iNpgFlcJRQk9UyifmoD6XvJANMVttkEP5xje/_apis/pipelines/1/runs/564/signedlogcontent/4?urlExpires=2020-07-09T14%3A05%3A52.5110439Z=HMACV1=gMGczJ8vtNPeQFE0GpjMxSS1BGq14RJLXUfjsLnaX7s%3D
> Maybe we should have a way to report the results in an easy way to read. For 
> example, Jenkins test report-like feature.
> We should maybe also take a look for 
> https://github.com/check-run-reporter/action.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32357) Investigate test result reporter integration

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161505#comment-17161505
 ] 

Apache Spark commented on SPARK-32357:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/29169

> Investigate test result reporter integration
> 
>
> Key: SPARK-32357
> URL: https://issues.apache.org/jira/browse/SPARK-32357
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currently, the readability in the logs are not really good. For example, see 
> https://pipelines.actions.githubusercontent.com/gik0C3if0ep5i8iNpgFlcJRQk9UyifmoD6XvJANMVttkEP5xje/_apis/pipelines/1/runs/564/signedlogcontent/4?urlExpires=2020-07-09T14%3A05%3A52.5110439Z=HMACV1=gMGczJ8vtNPeQFE0GpjMxSS1BGq14RJLXUfjsLnaX7s%3D
> Maybe we should have a way to report the results in an easy way to read. For 
> example, Jenkins test report-like feature.
> We should maybe also take a look for 
> https://github.com/check-run-reporter/action.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32375) Implement TableCatalog for JDBC

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32375:


Assignee: (was: Apache Spark)

> Implement TableCatalog for JDBC
> ---
>
> Key: SPARK-32375
> URL: https://issues.apache.org/jira/browse/SPARK-32375
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Implement the TableCatalog interface, in particular:
> - list tables
> - table exists
> - drop table
> - rename table
> - Optionally, alter table
> - Optionally, load table
> - Optionally, create table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32375) Implement TableCatalog for JDBC

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161500#comment-17161500
 ] 

Apache Spark commented on SPARK-32375:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/29168

> Implement TableCatalog for JDBC
> ---
>
> Key: SPARK-32375
> URL: https://issues.apache.org/jira/browse/SPARK-32375
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Implement the TableCatalog interface, in particular:
> - list tables
> - table exists
> - drop table
> - rename table
> - Optionally, alter table
> - Optionally, load table
> - Optionally, create table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32375) Implement TableCatalog for JDBC

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32375:


Assignee: Apache Spark

> Implement TableCatalog for JDBC
> ---
>
> Key: SPARK-32375
> URL: https://issues.apache.org/jira/browse/SPARK-32375
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Implement the TableCatalog interface, in particular:
> - list tables
> - table exists
> - drop table
> - rename table
> - Optionally, alter table
> - Optionally, load table
> - Optionally, create table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32375) Implement TableCatalog for JDBC

2020-07-20 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-32375:
--

 Summary: Implement TableCatalog for JDBC
 Key: SPARK-32375
 URL: https://issues.apache.org/jira/browse/SPARK-32375
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Implement the TableCatalog interface, in particular:
- list tables
- table exists
- drop table
- rename table
- Optionally, alter table
- Optionally, load table
- Optionally, create table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32374) Disallow setting properties when creating temporary views

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161478#comment-17161478
 ] 

Apache Spark commented on SPARK-32374:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/29167

> Disallow setting properties when creating temporary views
> -
>
> Key: SPARK-32374
> URL: https://issues.apache.org/jira/browse/SPARK-32374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Priority: Major
>
> Currently, you can specify properties when creating a temporary view. 
> However, they are not used and SHOW TBLPROPERTIES always returns an empty 
> result on temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32374) Disallow setting properties when creating temporary views

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32374:


Assignee: Apache Spark

> Disallow setting properties when creating temporary views
> -
>
> Key: SPARK-32374
> URL: https://issues.apache.org/jira/browse/SPARK-32374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Assignee: Apache Spark
>Priority: Major
>
> Currently, you can specify properties when creating a temporary view. 
> However, they are not used and SHOW TBLPROPERTIES always returns an empty 
> result on temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32374) Disallow setting properties when creating temporary views

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32374:


Assignee: (was: Apache Spark)

> Disallow setting properties when creating temporary views
> -
>
> Key: SPARK-32374
> URL: https://issues.apache.org/jira/browse/SPARK-32374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Priority: Major
>
> Currently, you can specify properties when creating a temporary view. 
> However, they are not used and SHOW TBLPROPERTIES always returns an empty 
> result on temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32374) Disallow setting properties when creating temporary views

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161479#comment-17161479
 ] 

Apache Spark commented on SPARK-32374:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/29167

> Disallow setting properties when creating temporary views
> -
>
> Key: SPARK-32374
> URL: https://issues.apache.org/jira/browse/SPARK-32374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Priority: Major
>
> Currently, you can specify properties when creating a temporary view. 
> However, they are not used and SHOW TBLPROPERTIES always returns an empty 
> result on temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32374) Disallow setting properties when creating temporary views

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32374:


Assignee: Apache Spark

> Disallow setting properties when creating temporary views
> -
>
> Key: SPARK-32374
> URL: https://issues.apache.org/jira/browse/SPARK-32374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Assignee: Apache Spark
>Priority: Major
>
> Currently, you can specify properties when creating a temporary view. 
> However, they are not used and SHOW TBLPROPERTIES always returns an empty 
> result on temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32374) Disallow setting properties when creating temporary views

2020-07-20 Thread Terry Kim (Jira)
Terry Kim created SPARK-32374:
-

 Summary: Disallow setting properties when creating temporary views
 Key: SPARK-32374
 URL: https://issues.apache.org/jira/browse/SPARK-32374
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Terry Kim


Currently, you can specify properties when creating a temporary view. However, 
they are not used and SHOW TBLPROPERTIES always returns an empty result on 
temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32373) Spark Standalone - RetryingBlockFetcher tries to get block from worker even 10mins after it was de-registered from spark cluster

2020-07-20 Thread t oo (Jira)
t oo created SPARK-32373:


 Summary: Spark Standalone - RetryingBlockFetcher tries to get 
block from worker even 10mins after it was de-registered from spark cluster
 Key: SPARK-32373
 URL: https://issues.apache.org/jira/browse/SPARK-32373
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Scheduler, Shuffle, Spark Core
Affects Versions: 2.4.6
Reporter: t oo


Using spark standalone in 2.4.6 with spot ec2 instances, the .242 IP instance 
was terminated at 12:00:11pm, before then it appeared registered in Spark UI as 
ALIVE for few hours, it then appeared in Spark UI as DEAD until 12:16pm, then 
it disappeared from Spark UI completely. An app that started at 11:24am had 
below error. As you can see in below app log from another worker it is still 
trying to get shuffle block from .242 IP at 12:10pm (10mins after the worker 
was removed from the spark cluster). I would expect that once within 2mins of 
the worker being removed from the cluster that it would stop retrying

 
{code:java}

2020-07-20 12:10:02,702 [Block Fetch Retry-9-3] ERROR 
org.apache.spark.network.shuffle.RetryingBlockFetcher - Exception while 
beginning fetch of 1 outstanding blocks (after 3 retries)
java.io.IOException: Connecting to /redact.242:7337 timed out (12 ms)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:243)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient.lambda$fetchBlocks$0(ExternalShuffleClient.java:100)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.lambda$initiateRetry$0(RetryingBlockFetcher.java:169)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
2020-07-20 12:07:57,700 [Block Fetch Retry-9-2] ERROR 
org.apache.spark.network.shuffle.RetryingBlockFetcher - Exception while 
beginning fetch of 1 outstanding blocks (after 2 retries)
java.io.IOException: Connecting to /redact.242:7337 timed out (12 ms)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:243)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient.lambda$fetchBlocks$0(ExternalShuffleClient.java:100)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.lambda$initiateRetry$0(RetryingBlockFetcher.java:169)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
2020-07-20 12:05:52,697 [Block Fetch Retry-9-1] ERROR 
org.apache.spark.network.shuffle.RetryingBlockFetcher - Exception while 
beginning fetch of 1 outstanding blocks (after 1 retries)
java.io.IOException: Connecting to /redact.242:7337 timed out (12 ms)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:243)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient.lambda$fetchBlocks$0(ExternalShuffleClient.java:100)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.lambda$initiateRetry$0(RetryingBlockFetcher.java:169)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 

[jira] [Commented] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161334#comment-17161334
 ] 

Apache Spark commented on SPARK-32372:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/29166

> "Resolved attribute(s) XXX missing" after dudup conflict references
> ---
>
> Key: SPARK-32372
> URL: https://issues.apache.org/jira/browse/SPARK-32372
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.2, 2.3.4, 2.4.6, 3.0.0
>Reporter: wuyi
>Priority: Blocker
>
> {code:java}
> // case class Person(id: Int, name: String, age: Int)
> sql("SELECT name, avg(age) as avg_age FROM person GROUP BY 
> name").createOrReplaceTempView("person_a")
> sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = 
> p2.name").createOrReplaceTempView("person_b")
> sql("SELECT * FROM person_a UNION SELECT * FROM person_b")   
> .createOrReplaceTempView("person_c")
> sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name 
> = p2.name").show
> {code}
> error:
> {code:java}
> [info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: 
> Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in 
> operator !Project [name#233, avg_age#235]. Attribute(s) with the same name 
> appear in the operation: avg_age. Please check if the right attribute(s) are 
> used.;;
> ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32372:


Assignee: (was: Apache Spark)

> "Resolved attribute(s) XXX missing" after dudup conflict references
> ---
>
> Key: SPARK-32372
> URL: https://issues.apache.org/jira/browse/SPARK-32372
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.2, 2.3.4, 2.4.6, 3.0.0
>Reporter: wuyi
>Priority: Blocker
>
> {code:java}
> // case class Person(id: Int, name: String, age: Int)
> sql("SELECT name, avg(age) as avg_age FROM person GROUP BY 
> name").createOrReplaceTempView("person_a")
> sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = 
> p2.name").createOrReplaceTempView("person_b")
> sql("SELECT * FROM person_a UNION SELECT * FROM person_b")   
> .createOrReplaceTempView("person_c")
> sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name 
> = p2.name").show
> {code}
> error:
> {code:java}
> [info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: 
> Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in 
> operator !Project [name#233, avg_age#235]. Attribute(s) with the same name 
> appear in the operation: avg_age. Please check if the right attribute(s) are 
> used.;;
> ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32372:


Assignee: Apache Spark

> "Resolved attribute(s) XXX missing" after dudup conflict references
> ---
>
> Key: SPARK-32372
> URL: https://issues.apache.org/jira/browse/SPARK-32372
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.2, 2.3.4, 2.4.6, 3.0.0
>Reporter: wuyi
>Assignee: Apache Spark
>Priority: Blocker
>
> {code:java}
> // case class Person(id: Int, name: String, age: Int)
> sql("SELECT name, avg(age) as avg_age FROM person GROUP BY 
> name").createOrReplaceTempView("person_a")
> sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = 
> p2.name").createOrReplaceTempView("person_b")
> sql("SELECT * FROM person_a UNION SELECT * FROM person_b")   
> .createOrReplaceTempView("person_c")
> sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name 
> = p2.name").show
> {code}
> error:
> {code:java}
> [info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: 
> Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in 
> operator !Project [name#233, avg_age#235]. Attribute(s) with the same name 
> appear in the operation: avg_age. Please check if the right attribute(s) are 
> used.;;
> ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161333#comment-17161333
 ] 

Apache Spark commented on SPARK-32372:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/29166

> "Resolved attribute(s) XXX missing" after dudup conflict references
> ---
>
> Key: SPARK-32372
> URL: https://issues.apache.org/jira/browse/SPARK-32372
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.2, 2.3.4, 2.4.6, 3.0.0
>Reporter: wuyi
>Priority: Blocker
>
> {code:java}
> // case class Person(id: Int, name: String, age: Int)
> sql("SELECT name, avg(age) as avg_age FROM person GROUP BY 
> name").createOrReplaceTempView("person_a")
> sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = 
> p2.name").createOrReplaceTempView("person_b")
> sql("SELECT * FROM person_a UNION SELECT * FROM person_b")   
> .createOrReplaceTempView("person_c")
> sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name 
> = p2.name").show
> {code}
> error:
> {code:java}
> [info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: 
> Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in 
> operator !Project [name#233, avg_age#235]. Attribute(s) with the same name 
> appear in the operation: avg_age. Please check if the right attribute(s) are 
> used.;;
> ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references

2020-07-20 Thread wuyi (Jira)
wuyi created SPARK-32372:


 Summary: "Resolved attribute(s) XXX missing" after dudup conflict 
references
 Key: SPARK-32372
 URL: https://issues.apache.org/jira/browse/SPARK-32372
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0, 2.4.6, 2.3.4, 2.2.2
Reporter: wuyi


{code:java}
// case class Person(id: Int, name: String, age: Int)

sql("SELECT name, avg(age) as avg_age FROM person GROUP BY 
name").createOrReplaceTempView("person_a")
sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = 
p2.name").createOrReplaceTempView("person_b")
sql("SELECT * FROM person_a UNION SELECT * FROM person_b")   
.createOrReplaceTempView("person_c")
sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name = 
p2.name").show
{code}
error:
{code:java}
[info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: 
Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in operator 
!Project [name#233, avg_age#235]. Attribute(s) with the same name appear in the 
operation: avg_age. Please check if the right attribute(s) are used.;;
...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32330) Preserve shuffled hash join build side partitioning

2020-07-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-32330:
---

Assignee: Cheng Su

> Preserve shuffled hash join build side partitioning
> ---
>
> Key: SPARK-32330
> URL: https://issues.apache.org/jira/browse/SPARK-32330
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Trivial
> Fix For: 3.1.0
>
>
> Currently `ShuffledHashJoin.outputPartitioning` inherits from 
> `HashJoin.outputPartitioning`, which only preserves stream side partitioning:
> `HashJoin.scala`
> {code:java}
> override def outputPartitioning: Partitioning = 
> streamedPlan.outputPartitioning
> {code}
> This loses build side partitioning information, and causes extra shuffle if 
> there's another join / group-by after this join.
> Example:
>  
> {code:java}
> // code placeholder
> withSQLConf(
> SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "50",
> SQLConf.SHUFFLE_PARTITIONS.key -> "2",
> SQLConf.PREFER_SORTMERGEJOIN.key -> "false") {
>   val df1 = spark.range(10).select($"id".as("k1"))
>   val df2 = spark.range(30).select($"id".as("k2"))
>   Seq("inner", "cross").foreach(joinType => {
> val plan = df1.join(df2, $"k1" === $"k2", joinType).groupBy($"k1").count()
>   .queryExecution.executedPlan
> assert(plan.collect { case _: ShuffledHashJoinExec => true }.size === 1)
> // No extra shuffle before aggregate
> assert(plan.collect { case _: ShuffleExchangeExec => true }.size === 2)
>   })
> }{code}
>  
> Current physical plan (having an extra shuffle on `k1` before aggregate)
>  
> {code:java}
> *(4) HashAggregate(keys=[k1#220L], functions=[count(1)], output=[k1#220L, 
> count#235L])
> +- Exchange hashpartitioning(k1#220L, 2), true, [id=#117]
>+- *(3) HashAggregate(keys=[k1#220L], functions=[partial_count(1)], 
> output=[k1#220L, count#239L])
>   +- *(3) Project [k1#220L]
>  +- ShuffledHashJoin [k1#220L], [k2#224L], Inner, BuildLeft
> :- Exchange hashpartitioning(k1#220L, 2), true, [id=#109]
> :  +- *(1) Project [id#218L AS k1#220L]
> : +- *(1) Range (0, 10, step=1, splits=2)
> +- Exchange hashpartitioning(k2#224L, 2), true, [id=#111]
>+- *(2) Project [id#222L AS k2#224L]
>   +- *(2) Range (0, 30, step=1, splits=2){code}
>  
> Ideal physical plan (no shuffle on `k1` before aggregate)
> {code:java}
>  *(3) HashAggregate(keys=[k1#220L], functions=[count(1)], output=[k1#220L, 
> count#235L])
> +- *(3) HashAggregate(keys=[k1#220L], functions=[partial_count(1)], 
> output=[k1#220L, count#239L])
>+- *(3) Project [k1#220L]
>   +- ShuffledHashJoin [k1#220L], [k2#224L], Inner, BuildLeft
>  :- Exchange hashpartitioning(k1#220L, 2), true, [id=#107]
>  :  +- *(1) Project [id#218L AS k1#220L]
>  : +- *(1) Range (0, 10, step=1, splits=2)
>  +- Exchange hashpartitioning(k2#224L, 2), true, [id=#109]
> +- *(2) Project [id#222L AS k2#224L]
>+- *(2) Range (0, 30, step=1, splits=2){code}
>  
> This can be fixed by overriding `outputPartitioning` method in 
> `ShuffledHashJoinExec`, similar to `SortMergeJoinExec`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32330) Preserve shuffled hash join build side partitioning

2020-07-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-32330.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29130
[https://github.com/apache/spark/pull/29130]

> Preserve shuffled hash join build side partitioning
> ---
>
> Key: SPARK-32330
> URL: https://issues.apache.org/jira/browse/SPARK-32330
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Priority: Trivial
> Fix For: 3.1.0
>
>
> Currently `ShuffledHashJoin.outputPartitioning` inherits from 
> `HashJoin.outputPartitioning`, which only preserves stream side partitioning:
> `HashJoin.scala`
> {code:java}
> override def outputPartitioning: Partitioning = 
> streamedPlan.outputPartitioning
> {code}
> This loses build side partitioning information, and causes extra shuffle if 
> there's another join / group-by after this join.
> Example:
>  
> {code:java}
> // code placeholder
> withSQLConf(
> SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "50",
> SQLConf.SHUFFLE_PARTITIONS.key -> "2",
> SQLConf.PREFER_SORTMERGEJOIN.key -> "false") {
>   val df1 = spark.range(10).select($"id".as("k1"))
>   val df2 = spark.range(30).select($"id".as("k2"))
>   Seq("inner", "cross").foreach(joinType => {
> val plan = df1.join(df2, $"k1" === $"k2", joinType).groupBy($"k1").count()
>   .queryExecution.executedPlan
> assert(plan.collect { case _: ShuffledHashJoinExec => true }.size === 1)
> // No extra shuffle before aggregate
> assert(plan.collect { case _: ShuffleExchangeExec => true }.size === 2)
>   })
> }{code}
>  
> Current physical plan (having an extra shuffle on `k1` before aggregate)
>  
> {code:java}
> *(4) HashAggregate(keys=[k1#220L], functions=[count(1)], output=[k1#220L, 
> count#235L])
> +- Exchange hashpartitioning(k1#220L, 2), true, [id=#117]
>+- *(3) HashAggregate(keys=[k1#220L], functions=[partial_count(1)], 
> output=[k1#220L, count#239L])
>   +- *(3) Project [k1#220L]
>  +- ShuffledHashJoin [k1#220L], [k2#224L], Inner, BuildLeft
> :- Exchange hashpartitioning(k1#220L, 2), true, [id=#109]
> :  +- *(1) Project [id#218L AS k1#220L]
> : +- *(1) Range (0, 10, step=1, splits=2)
> +- Exchange hashpartitioning(k2#224L, 2), true, [id=#111]
>+- *(2) Project [id#222L AS k2#224L]
>   +- *(2) Range (0, 30, step=1, splits=2){code}
>  
> Ideal physical plan (no shuffle on `k1` before aggregate)
> {code:java}
>  *(3) HashAggregate(keys=[k1#220L], functions=[count(1)], output=[k1#220L, 
> count#235L])
> +- *(3) HashAggregate(keys=[k1#220L], functions=[partial_count(1)], 
> output=[k1#220L, count#239L])
>+- *(3) Project [k1#220L]
>   +- ShuffledHashJoin [k1#220L], [k2#224L], Inner, BuildLeft
>  :- Exchange hashpartitioning(k1#220L, 2), true, [id=#107]
>  :  +- *(1) Project [id#218L AS k1#220L]
>  : +- *(1) Range (0, 10, step=1, splits=2)
>  +- Exchange hashpartitioning(k2#224L, 2), true, [id=#109]
> +- *(2) Project [id#222L AS k2#224L]
>+- *(2) Range (0, 30, step=1, splits=2){code}
>  
> This can be fixed by overriding `outputPartitioning` method in 
> `ShuffledHashJoinExec`, similar to `SortMergeJoinExec`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31869) BroadcastHashJoinExe's outputPartitioning can utilize the build side

2020-07-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31869.
-
Fix Version/s: 3.1.0
 Assignee: Terry Kim
   Resolution: Fixed

> BroadcastHashJoinExe's outputPartitioning can utilize the build side
> 
>
> Key: SPARK-31869
> URL: https://issues.apache.org/jira/browse/SPARK-31869
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
> Fix For: 3.1.0
>
>
> Currently, the BroadcastHashJoinExec's outputPartitioning only uses the 
> streamed side's outputPartitioning. Thus, if the join key is from the build 
> side for the join where one side is BroadcastHashJoinExec:
> {code:java}
> spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "500")
> val t1 = (0 until 100).map(i => (i % 5, i % 13)).toDF("i1", "j1")
> val t2 = (0 until 100).map(i => (i % 5, i % 13)).toDF("i2", "j2")
> val t3 = (0 until 20).map(i => (i % 7, i % 11)).toDF("i3", "j3")
> val t4 = (0 until 100).map(i => (i % 5, i % 13)).toDF("i4", "j4")
> // join1 is a sort merge join.
> val join1 = t1.join(t2, t1("i1") === t2("i2"))
> // join2 is a broadcast join where t3 is broadcasted.
> val join2 = join1.join(t3, join1("i1") === t3("i3"))
> // Join on the column from the broadcasted side (i3).
> val join3 = join2.join(t4, join2("i3") === t4("i4"))
> join3.explain
> {code}
> it produces Exchange hashpartitioning(i2#103, 200):
> {code:java}
> == Physical Plan ==
> *(6) SortMergeJoin [i3#29], [i4#40], Inner
> :- *(4) Sort [i3#29 ASC NULLS FIRST], false, 0
> :  +- Exchange hashpartitioning(i3#29, 200), true, [id=#55]
> : +- *(3) BroadcastHashJoin [i1#7], [i3#29], Inner, BuildRight
> ::- *(3) SortMergeJoin [i1#7], [i2#18], Inner
> ::  :- *(1) Sort [i1#7 ASC NULLS FIRST], false, 0
> ::  :  +- Exchange hashpartitioning(i1#7, 200), true, [id=#28]
> ::  : +- LocalTableScan [i1#7, j1#8]
> ::  +- *(2) Sort [i2#18 ASC NULLS FIRST], false, 0
> :: +- Exchange hashpartitioning(i2#18, 200), true, [id=#29]
> ::+- LocalTableScan [i2#18, j2#19]
> :+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, 
> int, false] as bigint))), [id=#34]
> :   +- LocalTableScan [i3#29, j3#30]
> +- *(5) Sort [i4#40 ASC NULLS FIRST], false, 0
>+- Exchange hashpartitioning(i4#40, 200), true, [id=#39]
>   +- LocalTableScan [i4#40, j4#41]
> {code}
>  But, since BroadcastHashJoinExec is only for equi-join, if the streamed side 
> has HashPartitioning, BroadcastHashJoinExec can utilize the info to eliminate 
> the exchange.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32302) Partially push down disjunctive predicates through Join/Partitions

2020-07-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-32302.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29101
[https://github.com/apache/spark/pull/29101]

> Partially push down disjunctive predicates through Join/Partitions
> --
>
> Key: SPARK-32302
> URL: https://issues.apache.org/jira/browse/SPARK-32302
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> In https://github.com/apache/spark/pull/28733, CNF conversion is used to push 
> down disjunctive predicates through join.
> It's a good improvement, however, 
> 1. converting all the predicates in CNF can lead to a very long result, even 
> with grouping functions over expressions.
> 2.the non-recursive is not easy for understanding.
> Essentially, we just need to traverse predicate and extract the convertible 
> sub-predicates like what we did in 
> https://github.com/apache/spark/pull/24598. There is no need to maintain the 
> CNF result set. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32324) Fix error messages during using PIVOT and lateral view

2020-07-20 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse resolved SPARK-32324.
--
Resolution: Not A Problem

> Fix error messages during using PIVOT and lateral view
> --
>
> Key: SPARK-32324
> URL: https://issues.apache.org/jira/browse/SPARK-32324
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: philipse
>Priority: Minor
>
> Currently when we use `lateral view` and `pivot` together in from clause, if  
> `lateral view` is before `pivot`, the error message is "LATERAL cannot be 
> used together with PIVOT in FROM clause".if if  `lateral view` is after 
> `pivot`,the query will be normal ,So the error messages "LATERAL cannot be 
> used together with PIVOT in FROM clause" is not accurate, we may improve it.
>  
> Steps to reproduce:
> {code:java}
> CREATE TABLE person (id INT, name STRING, age INT, class int, address STRING);
>  INSERT INTO person VALUES
>  (100, 'John', 30, 1, 'Street 1'),
>  (200, 'Mary', NULL, 1, 'Street 2'),
>  (300, 'Mike', 80, 3, 'Street 3'),
>  (400, 'Dan', 50, 4, 'Street 4');
> {code}
>  
> Query1:
>  
> {code:java}
> SELECT * FROM person
>  lateral view outer explode(array(30,60)) tabelName as c_age
>  lateral view explode(array(40,80)) as d_age
>  PIVOT (
>  count(distinct age) as a
>  for name in ('Mary','John')
>  )
> {code}
> Result 1:
>  
> {code:java}
> Error: org.apache.spark.sql.catalyst.parser.ParseException: 
>  LATERAL cannot be used together with PIVOT in FROM clause(line 1, pos 9)
> == SQL ==
>  SELECT * FROM person
>  -^^^
>  lateral view outer explode(array(30,60)) tabelName as c_age
>  lateral view explode(array(40,80)) as d_age
>  PIVOT (
>  count(distinct age) as a
>  for name in ('Mary','John')
>  ) (state=,code=0)
> {code}
>  
>  
> Query2:
>  
> {code:java}
> SELECT * FROM person
>  PIVOT (
>  count(distinct age) as a
>  for name in ('Mary','John')
>  )
>  lateral view outer explode(array(30,60)) tabelName as c_age
>  lateral view explode(array(40,80)) as d_age
> {code}
>  
> Reuslt2:
> +---+--++---++
> |id|Mary|John|c_age|d_age|
> +---+--++---++
> |300|NULL|NULL|30|40|
> |300|NULL|NULL|30|80|
> |300|NULL|NULL|60|40|
> |300|NULL|NULL|60|80|
> |100|0|NULL|30|40|
> |100|0|NULL|30|80|
> |100|0|NULL|60|40|
> |100|0|NULL|60|80|
> |400|NULL|NULL|30|40|
> |400|NULL|NULL|30|80|
> |400|NULL|NULL|60|40|
> |400|NULL|NULL|60|80|
> |200|NULL|1|30|40|
> |200|NULL|1|30|80|
> |200|NULL|1|60|40|
> |200|NULL|1|60|80|
> +---+--++---++
> ```
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32371) Autodetect persistently failing executor pods and fail the application logging the cause.

2020-07-20 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-32371:
---

 Summary: Autodetect persistently failing executor pods and fail 
the application logging the cause.
 Key: SPARK-32371
 URL: https://issues.apache.org/jira/browse/SPARK-32371
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: Prashant Sharma


{code:java}
[root@kyok-test-1 ~]# kubectl get po -w

NAME                                   READY   STATUS    RESTARTS   AGE

spark-shell-a3962a736bf9e775-exec-36   1/1     Running   0          5s

spark-shell-a3962a736bf9e775-exec-37   1/1     Running   0          3s

spark-shell-a3962a736bf9e775-exec-36   0/1     Error     0          5s

spark-shell-a3962a736bf9e775-exec-38   0/1     Pending   0          1s

spark-shell-a3962a736bf9e775-exec-38   0/1     Pending   0          1s

spark-shell-a3962a736bf9e775-exec-38   0/1     ContainerCreating   0          1s

spark-shell-a3962a736bf9e775-exec-36   0/1     Terminating         0          6s

spark-shell-a3962a736bf9e775-exec-36   0/1     Terminating         0          6s

spark-shell-a3962a736bf9e775-exec-37   0/1     Error               0          5s

spark-shell-a3962a736bf9e775-exec-38   1/1     Running             0          2s

spark-shell-a3962a736bf9e775-exec-39   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-39   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-39   0/1     ContainerCreating   0          0s

spark-shell-a3962a736bf9e775-exec-37   0/1     Terminating         0          6s

spark-shell-a3962a736bf9e775-exec-37   0/1     Terminating         0          6s

spark-shell-a3962a736bf9e775-exec-38   0/1     Error               0          4s

spark-shell-a3962a736bf9e775-exec-39   1/1     Running             0          1s

spark-shell-a3962a736bf9e775-exec-40   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-40   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-40   0/1     ContainerCreating   0          0s

spark-shell-a3962a736bf9e775-exec-38   0/1     Terminating         0          5s

spark-shell-a3962a736bf9e775-exec-38   0/1     Terminating         0          5s

spark-shell-a3962a736bf9e775-exec-39   0/1     Error               0          3s

spark-shell-a3962a736bf9e775-exec-40   1/1     Running             0          1s

spark-shell-a3962a736bf9e775-exec-41   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-41   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-41   0/1     ContainerCreating   0          0s

spark-shell-a3962a736bf9e775-exec-39   0/1     Terminating         0          4s

spark-shell-a3962a736bf9e775-exec-39   0/1     Terminating         0          4s

spark-shell-a3962a736bf9e775-exec-41   1/1     Running             0          2s

spark-shell-a3962a736bf9e775-exec-40   0/1     Error               0          4s

spark-shell-a3962a736bf9e775-exec-42   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-42   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-42   0/1     ContainerCreating   0          0s

spark-shell-a3962a736bf9e775-exec-40   0/1     Terminating         0          4s

spark-shell-a3962a736bf9e775-exec-40   0/1     Terminating         0          4s

{code}
A cascade of creating and terminating pods within 3-4 seconds, is created. It 
is difficult to see the logs of these constantly created and terminated pods. 
Thankfully, there is an option
{code:java}
spark.kubernetes.executor.deleteOnTermination false  {code}
to turn off the auto deletion of executor pods, and gives us opportunity to 
diagnose the problem. However, this is not turned on by default, and sometimes 
one may need to guess what caused the problem the previous run and steps to 
reproduce it and then re run the application with exact same setup to reproduce.

So, it might be good, if we could somehow detect this situation, of pod failing 
as soon as they start or failing on particular task and capture the error that 
caused the pod to terminate and relay it back to driver and log it. 

Alternatively, if we could auto-detect this situation, we can also auto stop 
creating more executor pods and fail with appropriate error also retaining the 
last failed pod for user's further investigation.

So far it is not yet evaluated how this can be achieved, but, this feature 
might be useful for K8s growing as a preferred choice for deploying spark. 
Logging this issue for further investigation and work.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32370) pyspark foreach/foreachPartition send http request failed

2020-07-20 Thread Tao Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Liu updated SPARK-32370:

Environment: 
spark 3.0

python3.7

macos 10.15

> pyspark foreach/foreachPartition send http request failed
> -
>
> Key: SPARK-32370
> URL: https://issues.apache.org/jira/browse/SPARK-32370
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 3.0.0
> Environment: spark 3.0
> python3.7
> macos 10.15
>Reporter: Tao Liu
>Priority: Major
>
> I use urllib.request to send http request in foreach/foreachPartition. 
> pyspark throw error as follow:I use urllib.request to send http request in 
> foreach/foreachPartition. pyspark throw error as follow:
> {color:#de350b}_objc[74094]: +[__NSPlaceholderDate initialize] may have been 
> in progress in another thread when fork() was called. We cannot safely call 
> it or ignore it in the fork() child process. Crashing instead. Set a 
> breakpoint on objc_initializeAfterForkError to debug.20/07/20 19:05:58 ERROR 
> Executor: Exception in task 7.0 in stage 0.0 (TID 
> 7)org.apache.spark.SparkException: Python worker exited unexpectedly 
> (crashed)        at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:536)
>          at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:525)
>          at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) 
>         at 
> org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:643) 
>         at 
> org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:621) 
>         at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:456)
>          at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>          at scala.collection.Iterator.foreach(Iterator.scala:941)         at 
> scala.collection.Iterator.foreach$(Iterator.scala:941)         at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>          at 
> scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)         at 
> scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)         
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)  
>        at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)      
>    at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)         
> at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)         at 
> org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)     
>     at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)   
>       at 
> scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)         
> at 
> org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
>          at 
> scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)         
> at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)       
>   at 
> org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
>          at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004)       
>   at 
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2133)_{color}
> _
> when i call rdd.foreach(send_http), 
> rdd=sc.parallelize(["http://192.168.1.1:5000/index.html;]), send_http defined 
> as follow:
> _def send_http(url):_    
>     _req = urllib.request.Request(url)_    
>     _resp = urllib.request.urlopen(req)_
> anyone can tell me the problem? thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32370) pyspark foreach/foreachPartition send http request failed

2020-07-20 Thread Tao Liu (Jira)
Tao Liu created SPARK-32370:
---

 Summary: pyspark foreach/foreachPartition send http request failed
 Key: SPARK-32370
 URL: https://issues.apache.org/jira/browse/SPARK-32370
 Project: Spark
  Issue Type: Question
  Components: PySpark
Affects Versions: 3.0.0
Reporter: Tao Liu


I use urllib.request to send http request in foreach/foreachPartition. pyspark 
throw error as follow:I use urllib.request to send http request in 
foreach/foreachPartition. pyspark throw error as follow:

{color:#de350b}_objc[74094]: +[__NSPlaceholderDate initialize] may have been in 
progress in another thread when fork() was called. We cannot safely call it or 
ignore it in the fork() child process. Crashing instead. Set a breakpoint on 
objc_initializeAfterForkError to debug.20/07/20 19:05:58 ERROR Executor: 
Exception in task 7.0 in stage 0.0 (TID 7)org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)        at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:536)
         at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:525)
         at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)   
      at 
org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:643)   
      at 
org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:621)   
      at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:456)
         at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)  
       at scala.collection.Iterator.foreach(Iterator.scala:941)         at 
scala.collection.Iterator.foreach$(Iterator.scala:941)         at 
org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)  
       at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)    
     at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)     
    at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)       
  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)   
      at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)         
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)         at 
org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)       
  at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)       
  at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)      
   at 
org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) 
        at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)  
       at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)  
       at 
org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)  
       at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004)         
at 
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2133)_{color}
_
when i call rdd.foreach(send_http), 
rdd=sc.parallelize(["http://192.168.1.1:5000/index.html;]), send_http defined 
as follow:

_def send_http(url):_    

    _req = urllib.request.Request(url)_    

    _resp = urllib.request.urlopen(req)_

anyone can tell me the problem? thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32369) pyspark foreach/foreachPartition send http request failed

2020-07-20 Thread Tao Liu (Jira)
Tao Liu created SPARK-32369:
---

 Summary: pyspark foreach/foreachPartition send http request failed
 Key: SPARK-32369
 URL: https://issues.apache.org/jira/browse/SPARK-32369
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.0.0
Reporter: Tao Liu


I use urllib.request to send http request in foreach/foreachPartition. pyspark 
throw error as follow:I use urllib.request to send http request in 
foreach/foreachPartition. pyspark throw error as follow:

_objc[74094]: +[__NSPlaceholderDate initialize] may have been in progress in 
another thread when fork() was called. We cannot safely call it or ignore it in 
the fork() child process. Crashing instead. Set a breakpoint on 
objc_initializeAfterForkError to debug.20/07/20 19:05:58 ERROR Executor: 
Exception in task 7.0 in stage 0.0 (TID 7)org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)        at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:536)
         at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:525)
         at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)   
      at 
org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:643)   
      at 
org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:621)   
      at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:456)
         at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)  
       at scala.collection.Iterator.foreach(Iterator.scala:941)         at 
scala.collection.Iterator.foreach$(Iterator.scala:941)         at 
org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)  
       at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)    
     at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)     
    at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)       
  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)   
      at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)         
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)         at 
org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)       
  at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)       
  at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)      
   at 
org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) 
        at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)  
       at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)  
       at 
org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)  
       at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004)         
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2133)_


when i call rdd.foreach(send_http), 
rdd=sc.parallelize(["http://192.168.1.1:5000/index.html;]), send_http defined 
as follow:

_def send_http(url):_    

_req = urllib.request.Request(url)_    

_resp = urllib.request.urlopen(req)_

anyone can tell me the problem? thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32366) Fix doc link of datetime pattern in 3.0 migration guide

2020-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32366.
--
Fix Version/s: 3.1.0
   3.0.1
   Resolution: Fixed

Issue resolved by pull request 29162
[https://github.com/apache/spark/pull/29162]

> Fix doc link of datetime pattern in 3.0 migration guide
> ---
>
> Key: SPARK-32366
> URL: https://issues.apache.org/jira/browse/SPARK-32366
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.0.1, 3.1.0
>
>
> In http://spark.apache.org/docs/latest/sql-migration-guide.html#query-engine, 
> there is a invalid reference for datetime reference 
> "sql-ref-datetime-pattern.md". We should fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32368) Options in PartitioningAwareFileIndex should respect case insensitivity

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32368:


Assignee: Apache Spark

> Options in PartitioningAwareFileIndex should respect case insensitivity
> ---
>
> Key: SPARK-32368
> URL: https://issues.apache.org/jira/browse/SPARK-32368
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
>
> The datasource options such as {{recursiveFileLookup}} or {{pathglobFilter}} 
> currently don't respect case insensitivity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32368) Options in PartitioningAwareFileIndex should respect case insensitivity

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32368:


Assignee: (was: Apache Spark)

> Options in PartitioningAwareFileIndex should respect case insensitivity
> ---
>
> Key: SPARK-32368
> URL: https://issues.apache.org/jira/browse/SPARK-32368
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> The datasource options such as {{recursiveFileLookup}} or {{pathglobFilter}} 
> currently don't respect case insensitivity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32368) Options in PartitioningAwareFileIndex should respect case insensitivity

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161173#comment-17161173
 ] 

Apache Spark commented on SPARK-32368:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29165

> Options in PartitioningAwareFileIndex should respect case insensitivity
> ---
>
> Key: SPARK-32368
> URL: https://issues.apache.org/jira/browse/SPARK-32368
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> The datasource options such as {{recursiveFileLookup}} or {{pathglobFilter}} 
> currently don't respect case insensitivity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32368) Options in PartitioningAwareFileIndex should respect case insensitivity

2020-07-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-32368:


 Summary: Options in PartitioningAwareFileIndex should respect case 
insensitivity
 Key: SPARK-32368
 URL: https://issues.apache.org/jira/browse/SPARK-32368
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1
Reporter: Hyukjin Kwon


The datasource options such as {{recursiveFileLookup}} or {{pathglobFilter}} 
currently don't respect case insensitivity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32240) UDF parse_url with a URL that contains pipe(|) will give incorrect result

2020-07-20 Thread Stanislav (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1716#comment-1716
 ] 

Stanislav commented on SPARK-32240:
---

I think the problem is that pipe is not a valid URL symbol.

{code:javascript}
encodeURI("https://a.b.c/index.php?params1=a|b=x")
"https://a.b.c/index.php?params1=a%7Cb=x;
{code}

It does work in hive's {{parse_url}} UDF, in my knowledge, so I would 
re-classify this bug as a compatibility issue

> UDF parse_url with a URL that contains pipe(|) will give incorrect result
> -
>
> Key: SPARK-32240
> URL: https://issues.apache.org/jira/browse/SPARK-32240
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Kritsada Limpawatkul
>Priority: Major
>
> I try to get the host from the URL with the code below.
> {code:java}
> Seq(
>   "https://a.b.c/index.php?params1=a|b=x",
>   "https://a.b.c/index.php?params1=a;
> )
>   .toDF("url")
>   .withColumn("host", callUDF("parse_url", $"url", lit("HOST")))
>   .show(false){code}
> The result of the code is as follows.
> {code:java}
> +-+-+
> |url  |host |
> +-+-+
> |https://a.b.c/index.php?params1=a|b=x|null |
> |https://a.b.c/index.php?params1=a|a.b.c|
> +-+-+
> {code}
> It seems like the host becomes null when the URL contains any pipe(|) 
> character.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161057#comment-17161057
 ] 

Apache Spark commented on SPARK-32367:
--

User 'merrily01' has created a pull request for this issue:
https://github.com/apache/spark/pull/29164

> Correct the spelling of parameter in KubernetesTestComponents
> -
>
> Key: SPARK-32367
> URL: https://issues.apache.org/jira/browse/SPARK-32367
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: merrily01
>Priority: Trivial
> Fix For: 3.1.0
>
>
> Correct the spelling of parameter 'spark.executor.instances' in 
> KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents

2020-07-20 Thread merrily01 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161056#comment-17161056
 ] 

merrily01 commented on SPARK-32367:
---

[https://github.com/apache/spark/pull/29164/files]

> Correct the spelling of parameter in KubernetesTestComponents
> -
>
> Key: SPARK-32367
> URL: https://issues.apache.org/jira/browse/SPARK-32367
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: merrily01
>Priority: Trivial
> Fix For: 3.1.0
>
>
> Correct the spelling of parameter 'spark.executor.instances' in 
> KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161059#comment-17161059
 ] 

Apache Spark commented on SPARK-32367:
--

User 'merrily01' has created a pull request for this issue:
https://github.com/apache/spark/pull/29164

> Correct the spelling of parameter in KubernetesTestComponents
> -
>
> Key: SPARK-32367
> URL: https://issues.apache.org/jira/browse/SPARK-32367
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: merrily01
>Priority: Trivial
> Fix For: 3.1.0
>
>
> Correct the spelling of parameter 'spark.executor.instances' in 
> KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32358) temp view not working after upgrading from 2.3.3 to 2.4.5

2020-07-20 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse resolved SPARK-32358.
--
Resolution: Won't Fix

> temp view not working after upgrading from 2.3.3 to 2.4.5
> -
>
> Key: SPARK-32358
> URL: https://issues.apache.org/jira/browse/SPARK-32358
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.5
>Reporter: philipse
>Priority: Major
>
> After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . 
> Please correct me if i miss sth. Thanks!
> Steps to reproduce:
> ```
> from pyspark.sql import SparkSession
>  from pyspark.sql import Row
>  spark=SparkSession\
>  .builder \
>  .appName('scenary_address_1') \
>  .enableHiveSupport() \
>  .getOrCreate()
>  
> address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)])
>  print("create dataframe finished")
>  address_tok_result_df.createOrReplaceTempView("scenery_address_test1")
>  print(spark.read.table('scenery_address_test1').dtypes)
>  spark.sql("select * from scenery_address_test1").show()
> ```
>  
> In spark2.3.3  I can easily gey the following result:
> ```
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]
>  +-+-++---
> |a|b|c|
> +-+-++---
> |1|难|80|
> |2|v|81|
> +-+-+—+
> ```
>  
> But in 2.4.5. i can only get,but without result showing out:
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32358) temp view not working after upgrading from 2.3.3 to 2.4.5

2020-07-20 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse resolved SPARK-32358.
--
Resolution: Fixed

> temp view not working after upgrading from 2.3.3 to 2.4.5
> -
>
> Key: SPARK-32358
> URL: https://issues.apache.org/jira/browse/SPARK-32358
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.5
>Reporter: philipse
>Priority: Major
>
> After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . 
> Please correct me if i miss sth. Thanks!
> Steps to reproduce:
> ```
> from pyspark.sql import SparkSession
>  from pyspark.sql import Row
>  spark=SparkSession\
>  .builder \
>  .appName('scenary_address_1') \
>  .enableHiveSupport() \
>  .getOrCreate()
>  
> address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)])
>  print("create dataframe finished")
>  address_tok_result_df.createOrReplaceTempView("scenery_address_test1")
>  print(spark.read.table('scenery_address_test1').dtypes)
>  spark.sql("select * from scenery_address_test1").show()
> ```
>  
> In spark2.3.3  I can easily gey the following result:
> ```
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]
>  +-+-++---
> |a|b|c|
> +-+-++---
> |1|难|80|
> |2|v|81|
> +-+-+—+
> ```
>  
> But in 2.4.5. i can only get,but without result showing out:
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-32358) temp view not working after upgrading from 2.3.3 to 2.4.5

2020-07-20 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse reopened SPARK-32358:
--

> temp view not working after upgrading from 2.3.3 to 2.4.5
> -
>
> Key: SPARK-32358
> URL: https://issues.apache.org/jira/browse/SPARK-32358
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.5
>Reporter: philipse
>Priority: Major
>
> After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . 
> Please correct me if i miss sth. Thanks!
> Steps to reproduce:
> ```
> from pyspark.sql import SparkSession
>  from pyspark.sql import Row
>  spark=SparkSession\
>  .builder \
>  .appName('scenary_address_1') \
>  .enableHiveSupport() \
>  .getOrCreate()
>  
> address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)])
>  print("create dataframe finished")
>  address_tok_result_df.createOrReplaceTempView("scenery_address_test1")
>  print(spark.read.table('scenery_address_test1').dtypes)
>  spark.sql("select * from scenery_address_test1").show()
> ```
>  
> In spark2.3.3  I can easily gey the following result:
> ```
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]
>  +-+-++---
> |a|b|c|
> +-+-++---
> |1|难|80|
> |2|v|81|
> +-+-+—+
> ```
>  
> But in 2.4.5. i can only get,but without result showing out:
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32367:


Assignee: Apache Spark

> Correct the spelling of parameter in KubernetesTestComponents
> -
>
> Key: SPARK-32367
> URL: https://issues.apache.org/jira/browse/SPARK-32367
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: merrily01
>Assignee: Apache Spark
>Priority: Trivial
> Fix For: 3.1.0
>
>
> Correct the spelling of parameter 'spark.executor.instances' in 
> KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32367:


Assignee: (was: Apache Spark)

> Correct the spelling of parameter in KubernetesTestComponents
> -
>
> Key: SPARK-32367
> URL: https://issues.apache.org/jira/browse/SPARK-32367
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: merrily01
>Priority: Trivial
> Fix For: 3.1.0
>
>
> Correct the spelling of parameter 'spark.executor.instances' in 
> KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents

2020-07-20 Thread merrily01 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

merrily01 reopened SPARK-32367:
---

> Correct the spelling of parameter in KubernetesTestComponents
> -
>
> Key: SPARK-32367
> URL: https://issues.apache.org/jira/browse/SPARK-32367
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: merrily01
>Priority: Trivial
> Fix For: 3.1.0
>
>
> Correct the spelling of parameter 'spark.executor.instances' in 
> KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161038#comment-17161038
 ] 

Apache Spark commented on SPARK-32367:
--

User 'merrily01' has created a pull request for this issue:
https://github.com/apache/spark/pull/29163

> Correct the spelling of parameter in KubernetesTestComponents
> -
>
> Key: SPARK-32367
> URL: https://issues.apache.org/jira/browse/SPARK-32367
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: merrily01
>Priority: Trivial
> Fix For: 3.1.0
>
>
> Correct the spelling of parameter 'spark.executor.instances' in 
> KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161037#comment-17161037
 ] 

Apache Spark commented on SPARK-32367:
--

User 'merrily01' has created a pull request for this issue:
https://github.com/apache/spark/pull/29163

> Correct the spelling of parameter in KubernetesTestComponents
> -
>
> Key: SPARK-32367
> URL: https://issues.apache.org/jira/browse/SPARK-32367
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: merrily01
>Priority: Trivial
> Fix For: 3.1.0
>
>
> Correct the spelling of parameter 'spark.executor.instances' in 
> KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents

2020-07-20 Thread merrily01 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

merrily01 resolved SPARK-32367.
---
Resolution: Works for Me

> Correct the spelling of parameter in KubernetesTestComponents
> -
>
> Key: SPARK-32367
> URL: https://issues.apache.org/jira/browse/SPARK-32367
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: merrily01
>Priority: Trivial
> Fix For: 3.1.0
>
>
> Correct the spelling of parameter 'spark.executor.instances' in 
> KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents

2020-07-20 Thread merrily01 (Jira)
merrily01 created SPARK-32367:
-

 Summary: Correct the spelling of parameter in 
KubernetesTestComponents
 Key: SPARK-32367
 URL: https://issues.apache.org/jira/browse/SPARK-32367
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: merrily01
 Fix For: 3.1.0


Correct the spelling of parameter 'spark.executor.instances' in 
KubernetesTestComponents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32366) Fix doc link of datetime pattern in 3.0 migration guide

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161029#comment-17161029
 ] 

Apache Spark commented on SPARK-32366:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/29162

> Fix doc link of datetime pattern in 3.0 migration guide
> ---
>
> Key: SPARK-32366
> URL: https://issues.apache.org/jira/browse/SPARK-32366
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> In http://spark.apache.org/docs/latest/sql-migration-guide.html#query-engine, 
> there is a invalid reference for datetime reference 
> "sql-ref-datetime-pattern.md". We should fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32366) Fix doc link of datetime pattern in 3.0 migration guide

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32366:


Assignee: Apache Spark  (was: Gengliang Wang)

> Fix doc link of datetime pattern in 3.0 migration guide
> ---
>
> Key: SPARK-32366
> URL: https://issues.apache.org/jira/browse/SPARK-32366
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> In http://spark.apache.org/docs/latest/sql-migration-guide.html#query-engine, 
> there is a invalid reference for datetime reference 
> "sql-ref-datetime-pattern.md". We should fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32366) Fix doc link of datetime pattern in 3.0 migration guide

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32366:


Assignee: Gengliang Wang  (was: Apache Spark)

> Fix doc link of datetime pattern in 3.0 migration guide
> ---
>
> Key: SPARK-32366
> URL: https://issues.apache.org/jira/browse/SPARK-32366
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> In http://spark.apache.org/docs/latest/sql-migration-guide.html#query-engine, 
> there is a invalid reference for datetime reference 
> "sql-ref-datetime-pattern.md". We should fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32366) Fix doc link of datetime pattern in 3.0 migration guide

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161028#comment-17161028
 ] 

Apache Spark commented on SPARK-32366:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/29162

> Fix doc link of datetime pattern in 3.0 migration guide
> ---
>
> Key: SPARK-32366
> URL: https://issues.apache.org/jira/browse/SPARK-32366
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> In http://spark.apache.org/docs/latest/sql-migration-guide.html#query-engine, 
> there is a invalid reference for datetime reference 
> "sql-ref-datetime-pattern.md". We should fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32366) Fix doc link of datetime pattern in 3.0 migration guide

2020-07-20 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-32366:
--

 Summary: Fix doc link of datetime pattern in 3.0 migration guide
 Key: SPARK-32366
 URL: https://issues.apache.org/jira/browse/SPARK-32366
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.0.1, 3.1.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


In http://spark.apache.org/docs/latest/sql-migration-guide.html#query-engine, 
there is a invalid reference for datetime reference 
"sql-ref-datetime-pattern.md". We should fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32364) `path` argument of DataFrame.load/save should override the existing options

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32364:
--
Description: 
Although we introduced CaseInsensitiveMap and CaseInsensitiveStringMap(in 
DSv2), when a user have multiple options like `path`, `paTH`, and `PATH` for 
the same key `path`, `option()/options()` are non-deterministic because 
`extraOptions` is `HashMap`. This issue aims to make load/save respect its 
direct path argument always and ignore the existing options. It's because that 
load/save function is independent from users' typos like `paTH` and is designed 
to be invoked at the last operation. So, load/save should work consistently and 
correctly always.

Please note that this doesn't aim to enforce case-insensitivity to 
`option()/options()` or `extraOptions` variable because that might be 
considered as a behavior change.

{code}
spark.read
  .option("paTh", "1")
  .option("PATH", "2")
  .option("Path", "3")
  .option("patH", "4")
  .load("5")
...
org.apache.spark.sql.AnalysisException:
Path does not exist: file:/.../1;
{code}

  was:
{code}
spark.read
  .option("paTh", "1")
  .option("PATH", "2")
  .option("Path", "3")
  .option("patH", "4")
  .load("5")
...
org.apache.spark.sql.AnalysisException:
Path does not exist: file:/.../1;
{code}

{code}
spark.read
  .option("PATH", "2")
  .option("Path", "3")
  .option("patH", "4")
  .load("5")
...
org.apache.spark.sql.AnalysisException:
Path does not exist: file:/.../4;
{code}


> `path` argument of DataFrame.load/save should override the existing options
> ---
>
> Key: SPARK-32364
> URL: https://issues.apache.org/jira/browse/SPARK-32364
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Although we introduced CaseInsensitiveMap and CaseInsensitiveStringMap(in 
> DSv2), when a user have multiple options like `path`, `paTH`, and `PATH` for 
> the same key `path`, `option()/options()` are non-deterministic because 
> `extraOptions` is `HashMap`. This issue aims to make load/save respect its 
> direct path argument always and ignore the existing options. It's because 
> that load/save function is independent from users' typos like `paTH` and is 
> designed to be invoked at the last operation. So, load/save should work 
> consistently and correctly always.
> Please note that this doesn't aim to enforce case-insensitivity to 
> `option()/options()` or `extraOptions` variable because that might be 
> considered as a behavior change.
> {code}
> spark.read
>   .option("paTh", "1")
>   .option("PATH", "2")
>   .option("Path", "3")
>   .option("patH", "4")
>   .load("5")
> ...
> org.apache.spark.sql.AnalysisException:
> Path does not exist: file:/.../1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32364) `path` argument of DataFrame.load/save should override the existing options

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32364:
--
Description: 
{code}
spark.read
  .option("paTh", "1")
  .option("PATH", "2")
  .option("Path", "3")
  .option("patH", "4")
  .load("5")
...
org.apache.spark.sql.AnalysisException:
Path does not exist: file:/.../1;
{code}

{code}
spark.read
  .option("PATH", "2")
  .option("Path", "3")
  .option("patH", "4")
  .load("5")
...
org.apache.spark.sql.AnalysisException:
Path does not exist: file:/.../4;
{code}

  was:
{code}
spark.read
  .option("paTh", "1")
  .option("PATH", "2")
  .option("Path", "3")
  .option("patH", "4")
  .parquet("5")
...
org.apache.spark.sql.AnalysisException:
Path does not exist: file:/.../1;
{code}

{code}
spark.read
  .option("PATH", "2")
  .option("Path", "3")
  .option("patH", "4")
  .parquet("5")
...
org.apache.spark.sql.AnalysisException:
Path does not exist: file:/.../4;
{code}


> `path` argument of DataFrame.load/save should override the existing options
> ---
>
> Key: SPARK-32364
> URL: https://issues.apache.org/jira/browse/SPARK-32364
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> spark.read
>   .option("paTh", "1")
>   .option("PATH", "2")
>   .option("Path", "3")
>   .option("patH", "4")
>   .load("5")
> ...
> org.apache.spark.sql.AnalysisException:
> Path does not exist: file:/.../1;
> {code}
> {code}
> spark.read
>   .option("PATH", "2")
>   .option("Path", "3")
>   .option("patH", "4")
>   .load("5")
> ...
> org.apache.spark.sql.AnalysisException:
> Path does not exist: file:/.../4;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32344) Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32344:
--
Fix Version/s: 2.4.7

> Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates
> 
>
> Key: SPARK-32344
> URL: https://issues.apache.org/jira/browse/SPARK-32344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> {code}
> scala> sql("SELECT FIRST(DISTINCT v) FROM VALUES 1, 2, 3 t(v)").show()
> ...
> Caused by: java.lang.UnsupportedOperationException: Cannot evaluate 
> expression: false#37
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:258)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeReference.eval(namedExpressions.scala:226)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.First.ignoreNulls(First.scala:68)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions$lzycompute(First.scala:82)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions(First.scala:81)
>   at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$15.apply(HashAggregateExec.scala:268)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32365:


Assignee: (was: Apache Spark)

> Fix java.lang.IndexOutOfBoundsException: No group -1
> 
>
> Key: SPARK-32365
> URL: https://issues.apache.org/jira/browse/SPARK-32365
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current implement of regexp_extract will throws a unprocessed exception 
> show below:
> SELECT regexp_extract('1a 2b 14m', 'd+' -1)
> {code:java}
> java.lang.IndexOutOfBoundsException: No group -1
> java.util.regex.Matcher.group(Matcher.java:538)
> org.apache.spark.sql.catalyst.expressions.RegExpExtract.nullSafeEval(regexpExpressions.scala:455)
> org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:704)
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52)
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160942#comment-17160942
 ] 

Apache Spark commented on SPARK-32365:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/29161

> Fix java.lang.IndexOutOfBoundsException: No group -1
> 
>
> Key: SPARK-32365
> URL: https://issues.apache.org/jira/browse/SPARK-32365
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current implement of regexp_extract will throws a unprocessed exception 
> show below:
> SELECT regexp_extract('1a 2b 14m', 'd+' -1)
> {code:java}
> java.lang.IndexOutOfBoundsException: No group -1
> java.util.regex.Matcher.group(Matcher.java:538)
> org.apache.spark.sql.catalyst.expressions.RegExpExtract.nullSafeEval(regexpExpressions.scala:455)
> org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:704)
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52)
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32365:


Assignee: Apache Spark

> Fix java.lang.IndexOutOfBoundsException: No group -1
> 
>
> Key: SPARK-32365
> URL: https://issues.apache.org/jira/browse/SPARK-32365
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> The current implement of regexp_extract will throws a unprocessed exception 
> show below:
> SELECT regexp_extract('1a 2b 14m', 'd+' -1)
> {code:java}
> java.lang.IndexOutOfBoundsException: No group -1
> java.util.regex.Matcher.group(Matcher.java:538)
> org.apache.spark.sql.catalyst.expressions.RegExpExtract.nullSafeEval(regexpExpressions.scala:455)
> org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:704)
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52)
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1

2020-07-20 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-32365:
---
Description: 
The current implement of regexp_extract will throws a unprocessed exception 
show below:

SELECT regexp_extract('1a 2b 14m', 'd+' -1)


{code:java}
java.lang.IndexOutOfBoundsException: No group -1
java.util.regex.Matcher.group(Matcher.java:538)
org.apache.spark.sql.catalyst.expressions.RegExpExtract.nullSafeEval(regexpExpressions.scala:455)
org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:704)
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52)
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45)
{code}


  was:
The current implement of regexp_extract will throws a unprocessed exception 
show below:

SELECT regexp_extract('1a 2b 14m', 'd+' -1)

java.util.regex.Matcher.group(Matcher.java:538)


> Fix java.lang.IndexOutOfBoundsException: No group -1
> 
>
> Key: SPARK-32365
> URL: https://issues.apache.org/jira/browse/SPARK-32365
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current implement of regexp_extract will throws a unprocessed exception 
> show below:
> SELECT regexp_extract('1a 2b 14m', 'd+' -1)
> {code:java}
> java.lang.IndexOutOfBoundsException: No group -1
> java.util.regex.Matcher.group(Matcher.java:538)
> org.apache.spark.sql.catalyst.expressions.RegExpExtract.nullSafeEval(regexpExpressions.scala:455)
> org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:704)
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52)
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1

2020-07-20 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-32365:
---
Description: 
The current implement of regexp_extract will throws a unprocessed exception 
show below:

SELECT regexp_extract('1a 2b 14m', 'd+' -1)

java.util.regex.Matcher.group(Matcher.java:538)

> Fix java.lang.IndexOutOfBoundsException: No group -1
> 
>
> Key: SPARK-32365
> URL: https://issues.apache.org/jira/browse/SPARK-32365
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current implement of regexp_extract will throws a unprocessed exception 
> show below:
> SELECT regexp_extract('1a 2b 14m', 'd+' -1)
> java.util.regex.Matcher.group(Matcher.java:538)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1

2020-07-20 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-32365:
--

 Summary: Fix java.lang.IndexOutOfBoundsException: No group -1
 Key: SPARK-32365
 URL: https://issues.apache.org/jira/browse/SPARK-32365
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1, 3.1.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32344) Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32344:
--
Affects Version/s: 2.2.3

> Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates
> 
>
> Key: SPARK-32344
> URL: https://issues.apache.org/jira/browse/SPARK-32344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 3.0.1, 3.1.0
>
>
> {code}
> scala> sql("SELECT FIRST(DISTINCT v) FROM VALUES 1, 2, 3 t(v)").show()
> ...
> Caused by: java.lang.UnsupportedOperationException: Cannot evaluate 
> expression: false#37
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:258)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeReference.eval(namedExpressions.scala:226)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.First.ignoreNulls(First.scala:68)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions$lzycompute(First.scala:82)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions(First.scala:81)
>   at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$15.apply(HashAggregateExec.scala:268)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32344) Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32344:
--
Affects Version/s: 2.3.4

> Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates
> 
>
> Key: SPARK-32344
> URL: https://issues.apache.org/jira/browse/SPARK-32344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.6, 3.0.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 3.0.1, 3.1.0
>
>
> {code}
> scala> sql("SELECT FIRST(DISTINCT v) FROM VALUES 1, 2, 3 t(v)").show()
> ...
> Caused by: java.lang.UnsupportedOperationException: Cannot evaluate 
> expression: false#37
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:258)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeReference.eval(namedExpressions.scala:226)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.First.ignoreNulls(First.scala:68)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions$lzycompute(First.scala:82)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions(First.scala:81)
>   at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$15.apply(HashAggregateExec.scala:268)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32364) `path` argument of DataFrame.load/save should override the existing options

2020-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32364:
--
Description: 
{code}
spark.read
  .option("paTh", "1")
  .option("PATH", "2")
  .option("Path", "3")
  .option("patH", "4")
  .parquet("5")
...
org.apache.spark.sql.AnalysisException:
Path does not exist: file:/.../1;
{code}

{code}
spark.read
  .option("PATH", "2")
  .option("Path", "3")
  .option("patH", "4")
  .parquet("5")
...
org.apache.spark.sql.AnalysisException:
Path does not exist: file:/.../4;
{code}

  was:
{code}
spark.read
  .option("paTh", "1")
  .option("PATH", "2")
  .option("Path", "3")
  .option("patH", "4")
  .parquet("5")
{code}

{code}
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/Users/dongjoon/APACHE/spark-release/spark-3.0.0-bin-hadoop3.2/1;
{code}


> `path` argument of DataFrame.load/save should override the existing options
> ---
>
> Key: SPARK-32364
> URL: https://issues.apache.org/jira/browse/SPARK-32364
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> spark.read
>   .option("paTh", "1")
>   .option("PATH", "2")
>   .option("Path", "3")
>   .option("patH", "4")
>   .parquet("5")
> ...
> org.apache.spark.sql.AnalysisException:
> Path does not exist: file:/.../1;
> {code}
> {code}
> spark.read
>   .option("PATH", "2")
>   .option("Path", "3")
>   .option("patH", "4")
>   .parquet("5")
> ...
> org.apache.spark.sql.AnalysisException:
> Path does not exist: file:/.../4;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32364) `path` argument of DataFrame.load/save should override the existing options

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32364:


Assignee: Apache Spark

> `path` argument of DataFrame.load/save should override the existing options
> ---
>
> Key: SPARK-32364
> URL: https://issues.apache.org/jira/browse/SPARK-32364
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> spark.read
>   .option("paTh", "1")
>   .option("PATH", "2")
>   .option("Path", "3")
>   .option("patH", "4")
>   .parquet("5")
> {code}
> {code}
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/Users/dongjoon/APACHE/spark-release/spark-3.0.0-bin-hadoop3.2/1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32364) `path` argument of DataFrame.load/save should override the existing options

2020-07-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160924#comment-17160924
 ] 

Apache Spark commented on SPARK-32364:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/29160

> `path` argument of DataFrame.load/save should override the existing options
> ---
>
> Key: SPARK-32364
> URL: https://issues.apache.org/jira/browse/SPARK-32364
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> spark.read
>   .option("paTh", "1")
>   .option("PATH", "2")
>   .option("Path", "3")
>   .option("patH", "4")
>   .parquet("5")
> {code}
> {code}
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/Users/dongjoon/APACHE/spark-release/spark-3.0.0-bin-hadoop3.2/1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32364) `path` argument of DataFrame.load/save should override the existing options

2020-07-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32364:


Assignee: (was: Apache Spark)

> `path` argument of DataFrame.load/save should override the existing options
> ---
>
> Key: SPARK-32364
> URL: https://issues.apache.org/jira/browse/SPARK-32364
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> spark.read
>   .option("paTh", "1")
>   .option("PATH", "2")
>   .option("Path", "3")
>   .option("patH", "4")
>   .parquet("5")
> {code}
> {code}
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/Users/dongjoon/APACHE/spark-release/spark-3.0.0-bin-hadoop3.2/1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org