[jira] [Commented] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition
[ https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161723#comment-17161723 ] Apache Spark commented on SPARK-32377: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/29175 > CaseInsensitiveMap should be deterministic for addition > --- > > Key: SPARK-32377 > URL: https://issues.apache.org/jira/browse/SPARK-32377 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap > var m = CaseInsensitiveMap(Map.empty[String, String]) > Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", > "5")).foreach { kv => > m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] > println(m.get("path")) > } > Some(1) > Some(2) > Some(3) > Some(4) > Some(1) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition
[ https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161724#comment-17161724 ] Apache Spark commented on SPARK-32377: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/29175 > CaseInsensitiveMap should be deterministic for addition > --- > > Key: SPARK-32377 > URL: https://issues.apache.org/jira/browse/SPARK-32377 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap > var m = CaseInsensitiveMap(Map.empty[String, String]) > Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", > "5")).foreach { kv => > m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] > println(m.get("path")) > } > Some(1) > Some(2) > Some(3) > Some(4) > Some(1) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32362) AdaptiveQueryExecSuite misses verifying AE results
[ https://issues.apache.org/jira/browse/SPARK-32362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-32362: Assignee: Lantao Jin > AdaptiveQueryExecSuite misses verifying AE results > -- > > Key: SPARK-32362 > URL: https://issues.apache.org/jira/browse/SPARK-32362 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Lantao Jin >Assignee: Lantao Jin >Priority: Major > > {code} > QueryTest.sameRows(result.toSeq, df.collect().toSeq) > {code} > Even the results are different, no fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32362) AdaptiveQueryExecSuite misses verifying AE results
[ https://issues.apache.org/jira/browse/SPARK-32362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-32362. -- Fix Version/s: 3.1.0 3.0.1 Resolution: Fixed Issue resolved by pull request 29158 [https://github.com/apache/spark/pull/29158] > AdaptiveQueryExecSuite misses verifying AE results > -- > > Key: SPARK-32362 > URL: https://issues.apache.org/jira/browse/SPARK-32362 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Lantao Jin >Assignee: Lantao Jin >Priority: Major > Fix For: 3.0.1, 3.1.0 > > > {code} > QueryTest.sameRows(result.toSeq, df.collect().toSeq) > {code} > Even the results are different, no fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1
[ https://issues.apache.org/jira/browse/SPARK-32365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32365. --- Fix Version/s: 3.1.0 3.0.1 Resolution: Fixed Issue resolved by pull request 29161 [https://github.com/apache/spark/pull/29161] > Fix java.lang.IndexOutOfBoundsException: No group -1 > > > Key: SPARK-32365 > URL: https://issues.apache.org/jira/browse/SPARK-32365 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.0.1, 3.1.0 > > > The current implement of regexp_extract will throws a unprocessed exception > show below: > SELECT regexp_extract('1a 2b 14m', 'd+' -1) > {code:java} > java.lang.IndexOutOfBoundsException: No group -1 > java.util.regex.Matcher.group(Matcher.java:538) > org.apache.spark.sql.catalyst.expressions.RegExpExtract.nullSafeEval(regexpExpressions.scala:455) > org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:704) > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52) > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1
[ https://issues.apache.org/jira/browse/SPARK-32365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-32365: - Assignee: jiaan.geng > Fix java.lang.IndexOutOfBoundsException: No group -1 > > > Key: SPARK-32365 > URL: https://issues.apache.org/jira/browse/SPARK-32365 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > > The current implement of regexp_extract will throws a unprocessed exception > show below: > SELECT regexp_extract('1a 2b 14m', 'd+' -1) > {code:java} > java.lang.IndexOutOfBoundsException: No group -1 > java.util.regex.Matcher.group(Matcher.java:538) > org.apache.spark.sql.catalyst.expressions.RegExpExtract.nullSafeEval(regexpExpressions.scala:455) > org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:704) > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52) > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-32360) Add MaxMinBy to support eliminate sorts
[ https://issues.apache.org/jira/browse/SPARK-32360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-32360. - > Add MaxMinBy to support eliminate sorts > --- > > Key: SPARK-32360 > URL: https://issues.apache.org/jira/browse/SPARK-32360 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32360) Add MaxMinBy to support eliminate sorts
[ https://issues.apache.org/jira/browse/SPARK-32360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32360. --- Resolution: Invalid Please see the PR discussion. Since the operation is order-sensitive, we should not eliminate it during the optimization. > Add MaxMinBy to support eliminate sorts > --- > > Key: SPARK-32360 > URL: https://issues.apache.org/jira/browse/SPARK-32360 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32361) Remove project if output is subset of child
[ https://issues.apache.org/jira/browse/SPARK-32361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ulysses you updated SPARK-32361: Summary: Remove project if output is subset of child (was: Support collapse project with case Aggregate(Project)) > Remove project if output is subset of child > --- > > Key: SPARK-32361 > URL: https://issues.apache.org/jira/browse/SPARK-32361 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition
[ https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32377: -- Affects Version/s: 2.1.3 > CaseInsensitiveMap should be deterministic for addition > --- > > Key: SPARK-32377 > URL: https://issues.apache.org/jira/browse/SPARK-32377 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap > var m = CaseInsensitiveMap(Map.empty[String, String]) > Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", > "5")).foreach { kv => > m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] > println(m.get("path")) > } > Some(1) > Some(2) > Some(3) > Some(4) > Some(1) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition
[ https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32377: -- Affects Version/s: 2.2.3 > CaseInsensitiveMap should be deterministic for addition > --- > > Key: SPARK-32377 > URL: https://issues.apache.org/jira/browse/SPARK-32377 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap > var m = CaseInsensitiveMap(Map.empty[String, String]) > Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", > "5")).foreach { kv => > m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] > println(m.get("path")) > } > Some(1) > Some(2) > Some(3) > Some(4) > Some(1) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition
[ https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32377: -- Description: {code} import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap var m = CaseInsensitiveMap(Map.empty[String, String]) Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", "5")).foreach { kv => m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] println(m.get("path")) } Some(1) Some(2) Some(3) Some(4) Some(1) {code} was: {code} import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap var m = CaseInsensitiveMap(Map.empty[String, String]) Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", "5")).foreach { kv => m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] println(m.get("path")) } // Exiting paste mode, now interpreting. Some(1) Some(2) Some(3) Some(4) Some(1) {code} > CaseInsensitiveMap should be deterministic for addition > --- > > Key: SPARK-32377 > URL: https://issues.apache.org/jira/browse/SPARK-32377 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap > var m = CaseInsensitiveMap(Map.empty[String, String]) > Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", > "5")).foreach { kv => > m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] > println(m.get("path")) > } > Some(1) > Some(2) > Some(3) > Some(4) > Some(1) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition
[ https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32377: -- Affects Version/s: 2.3.4 > CaseInsensitiveMap should be deterministic for addition > --- > > Key: SPARK-32377 > URL: https://issues.apache.org/jira/browse/SPARK-32377 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap > var m = CaseInsensitiveMap(Map.empty[String, String]) > Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", > "5")).foreach { kv => > m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] > println(m.get("path")) > } > Some(1) > Some(2) > Some(3) > Some(4) > Some(1) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition
[ https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32377: -- Description: {code} import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap var m = CaseInsensitiveMap(Map.empty[String, String]) Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", "5")).foreach { kv => m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] println(m.get("path")) } // Exiting paste mode, now interpreting. Some(1) Some(2) Some(3) Some(4) Some(1) {code} was: {code} test("CaseInsensitiveMap should be deterministic") { var m = CaseInsensitiveMap(Map.empty[String, String]) Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", "5")).foreach { kv => m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] assert(m.get("path") == Some(kv._2)) } } {code} > CaseInsensitiveMap should be deterministic for addition > --- > > Key: SPARK-32377 > URL: https://issues.apache.org/jira/browse/SPARK-32377 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap > var m = CaseInsensitiveMap(Map.empty[String, String]) > Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", > "5")).foreach { kv => > m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] > println(m.get("path")) > } > // Exiting paste mode, now interpreting. > Some(1) > Some(2) > Some(3) > Some(4) > Some(1) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32338) Add overload for slice that accepts Columns or Int
[ https://issues.apache.org/jira/browse/SPARK-32338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-32338. --- Fix Version/s: 3.1.0 Assignee: Nikolas Vanderhoof Resolution: Fixed Issue resolved by pull request 29138 https://github.com/apache/spark/pull/29138 > Add overload for slice that accepts Columns or Int > -- > > Key: SPARK-32338 > URL: https://issues.apache.org/jira/browse/SPARK-32338 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Nikolas Vanderhoof >Assignee: Nikolas Vanderhoof >Priority: Trivial > Fix For: 3.1.0 > > > Add an overload for org.apache.spark.sql.functions.slice with the following > signature: > {code:scala} > def slice(x: Column, start: Any, length: Any): Column = ??? > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition
[ https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161630#comment-17161630 ] Apache Spark commented on SPARK-32377: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/29172 > CaseInsensitiveMap should be deterministic for addition > --- > > Key: SPARK-32377 > URL: https://issues.apache.org/jira/browse/SPARK-32377 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > test("CaseInsensitiveMap should be deterministic") { > var m = CaseInsensitiveMap(Map.empty[String, String]) > Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", > "5")).foreach { kv => > m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] > assert(m.get("path") == Some(kv._2)) > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition
[ https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32377: Assignee: Apache Spark > CaseInsensitiveMap should be deterministic for addition > --- > > Key: SPARK-32377 > URL: https://issues.apache.org/jira/browse/SPARK-32377 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > {code} > test("CaseInsensitiveMap should be deterministic") { > var m = CaseInsensitiveMap(Map.empty[String, String]) > Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", > "5")).foreach { kv => > m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] > assert(m.get("path") == Some(kv._2)) > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition
[ https://issues.apache.org/jira/browse/SPARK-32377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32377: Assignee: (was: Apache Spark) > CaseInsensitiveMap should be deterministic for addition > --- > > Key: SPARK-32377 > URL: https://issues.apache.org/jira/browse/SPARK-32377 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > test("CaseInsensitiveMap should be deterministic") { > var m = CaseInsensitiveMap(Map.empty[String, String]) > Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", > "5")).foreach { kv => > m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] > assert(m.get("path") == Some(kv._2)) > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32377) CaseInsensitiveMap should be deterministic for addition
Dongjoon Hyun created SPARK-32377: - Summary: CaseInsensitiveMap should be deterministic for addition Key: SPARK-32377 URL: https://issues.apache.org/jira/browse/SPARK-32377 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0, 2.4.6 Reporter: Dongjoon Hyun {code} test("CaseInsensitiveMap should be deterministic") { var m = CaseInsensitiveMap(Map.empty[String, String]) Seq(("paTh", "1"), ("PATH", "2"), ("Path", "3"), ("patH", "4"), ("path", "5")).foreach { kv => m = (m + kv).asInstanceOf[CaseInsensitiveMap[String]] assert(m.get("path") == Some(kv._2)) } } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30876) Optimizer cannot infer from inferred constraints with join
[ https://issues.apache.org/jira/browse/SPARK-30876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-30876: Assignee: (was: Apache Spark) > Optimizer cannot infer from inferred constraints with join > -- > > Key: SPARK-30876 > URL: https://issues.apache.org/jira/browse/SPARK-30876 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:sql} > create table t1(a int, b int, c int); > create table t2(a int, b int, c int); > create table t3(a int, b int, c int); > select count(*) from t1 join t2 join t3 on (t1.a = t2.b and t2.b = t3.c and > t3.c = 1); > {code} > Spark 2.3+: > {noformat} > == Physical Plan == > *(4) HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition, true, [id=#102] >+- *(3) HashAggregate(keys=[], functions=[partial_count(1)]) > +- *(3) Project > +- *(3) BroadcastHashJoin [b#10], [c#14], Inner, BuildRight > :- *(3) Project [b#10] > : +- *(3) BroadcastHashJoin [a#6], [b#10], Inner, BuildRight > : :- *(3) Project [a#6] > : : +- *(3) Filter isnotnull(a#6) > : : +- *(3) ColumnarToRow > : :+- FileScan parquet default.t1[a#6] Batched: true, > DataFilters: [isnotnull(a#6)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: > struct > : +- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), > [id=#87] > :+- *(1) Project [b#10] > : +- *(1) Filter (isnotnull(b#10) AND (b#10 = 1)) > : +- *(1) ColumnarToRow > : +- FileScan parquet default.t2[b#10] Batched: > true, DataFilters: [isnotnull(b#10), (b#10 = 1)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(b), EqualTo(b,1)], > ReadSchema: struct > +- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), > [id=#96] >+- *(2) Project [c#14] > +- *(2) Filter (isnotnull(c#14) AND (c#14 = 1)) > +- *(2) ColumnarToRow > +- FileScan parquet default.t3[c#14] Batched: true, > DataFilters: [isnotnull(c#14), (c#14 = 1)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(c), EqualTo(c,1)], > ReadSchema: struct > Time taken: 3.785 seconds, Fetched 1 row(s) > {noformat} > Spark 2.2.x: > {noformat} > == Physical Plan == > *HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition >+- *HashAggregate(keys=[], functions=[partial_count(1)]) > +- *Project > +- *SortMergeJoin [b#19], [c#23], Inner > :- *Project [b#19] > : +- *SortMergeJoin [a#15], [b#19], Inner > : :- *Sort [a#15 ASC NULLS FIRST], false, 0 > : : +- Exchange hashpartitioning(a#15, 200) > : : +- *Filter (isnotnull(a#15) && (a#15 = 1)) > : :+- HiveTableScan [a#15], HiveTableRelation > `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#15, > b#16, c#17] > : +- *Sort [b#19 ASC NULLS FIRST], false, 0 > :+- Exchange hashpartitioning(b#19, 200) > : +- *Filter (isnotnull(b#19) && (b#19 = 1)) > : +- HiveTableScan [b#19], HiveTableRelation > `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#18, > b#19, c#20] > +- *Sort [c#23 ASC NULLS FIRST], false, 0 >+- Exchange hashpartitioning(c#23, 200) > +- *Filter (isnotnull(c#23) && (c#23 = 1)) > +- HiveTableScan [c#23], HiveTableRelation > `default`.`t3`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#21, > b#22, c#23] > Time taken: 0.728 seconds, Fetched 1 row(s) > {noformat} > Spark 2.2 can infer {{(a#15 = 1)}}, but Spark 2.3+ can't. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30876) Optimizer cannot infer from inferred constraints with join
[ https://issues.apache.org/jira/browse/SPARK-30876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-30876: Assignee: Apache Spark > Optimizer cannot infer from inferred constraints with join > -- > > Key: SPARK-30876 > URL: https://issues.apache.org/jira/browse/SPARK-30876 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > How to reproduce this issue: > {code:sql} > create table t1(a int, b int, c int); > create table t2(a int, b int, c int); > create table t3(a int, b int, c int); > select count(*) from t1 join t2 join t3 on (t1.a = t2.b and t2.b = t3.c and > t3.c = 1); > {code} > Spark 2.3+: > {noformat} > == Physical Plan == > *(4) HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition, true, [id=#102] >+- *(3) HashAggregate(keys=[], functions=[partial_count(1)]) > +- *(3) Project > +- *(3) BroadcastHashJoin [b#10], [c#14], Inner, BuildRight > :- *(3) Project [b#10] > : +- *(3) BroadcastHashJoin [a#6], [b#10], Inner, BuildRight > : :- *(3) Project [a#6] > : : +- *(3) Filter isnotnull(a#6) > : : +- *(3) ColumnarToRow > : :+- FileScan parquet default.t1[a#6] Batched: true, > DataFilters: [isnotnull(a#6)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: > struct > : +- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), > [id=#87] > :+- *(1) Project [b#10] > : +- *(1) Filter (isnotnull(b#10) AND (b#10 = 1)) > : +- *(1) ColumnarToRow > : +- FileScan parquet default.t2[b#10] Batched: > true, DataFilters: [isnotnull(b#10), (b#10 = 1)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(b), EqualTo(b,1)], > ReadSchema: struct > +- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), > [id=#96] >+- *(2) Project [c#14] > +- *(2) Filter (isnotnull(c#14) AND (c#14 = 1)) > +- *(2) ColumnarToRow > +- FileScan parquet default.t3[c#14] Batched: true, > DataFilters: [isnotnull(c#14), (c#14 = 1)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(c), EqualTo(c,1)], > ReadSchema: struct > Time taken: 3.785 seconds, Fetched 1 row(s) > {noformat} > Spark 2.2.x: > {noformat} > == Physical Plan == > *HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition >+- *HashAggregate(keys=[], functions=[partial_count(1)]) > +- *Project > +- *SortMergeJoin [b#19], [c#23], Inner > :- *Project [b#19] > : +- *SortMergeJoin [a#15], [b#19], Inner > : :- *Sort [a#15 ASC NULLS FIRST], false, 0 > : : +- Exchange hashpartitioning(a#15, 200) > : : +- *Filter (isnotnull(a#15) && (a#15 = 1)) > : :+- HiveTableScan [a#15], HiveTableRelation > `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#15, > b#16, c#17] > : +- *Sort [b#19 ASC NULLS FIRST], false, 0 > :+- Exchange hashpartitioning(b#19, 200) > : +- *Filter (isnotnull(b#19) && (b#19 = 1)) > : +- HiveTableScan [b#19], HiveTableRelation > `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#18, > b#19, c#20] > +- *Sort [c#23 ASC NULLS FIRST], false, 0 >+- Exchange hashpartitioning(c#23, 200) > +- *Filter (isnotnull(c#23) && (c#23 = 1)) > +- HiveTableScan [c#23], HiveTableRelation > `default`.`t3`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#21, > b#22, c#23] > Time taken: 0.728 seconds, Fetched 1 row(s) > {noformat} > Spark 2.2 can infer {{(a#15 = 1)}}, but Spark 2.3+ can't. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30876) Optimizer cannot infer from inferred constraints with join
[ https://issues.apache.org/jira/browse/SPARK-30876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161616#comment-17161616 ] Apache Spark commented on SPARK-30876: -- User 'navinvishy' has created a pull request for this issue: https://github.com/apache/spark/pull/29170 > Optimizer cannot infer from inferred constraints with join > -- > > Key: SPARK-30876 > URL: https://issues.apache.org/jira/browse/SPARK-30876 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:sql} > create table t1(a int, b int, c int); > create table t2(a int, b int, c int); > create table t3(a int, b int, c int); > select count(*) from t1 join t2 join t3 on (t1.a = t2.b and t2.b = t3.c and > t3.c = 1); > {code} > Spark 2.3+: > {noformat} > == Physical Plan == > *(4) HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition, true, [id=#102] >+- *(3) HashAggregate(keys=[], functions=[partial_count(1)]) > +- *(3) Project > +- *(3) BroadcastHashJoin [b#10], [c#14], Inner, BuildRight > :- *(3) Project [b#10] > : +- *(3) BroadcastHashJoin [a#6], [b#10], Inner, BuildRight > : :- *(3) Project [a#6] > : : +- *(3) Filter isnotnull(a#6) > : : +- *(3) ColumnarToRow > : :+- FileScan parquet default.t1[a#6] Batched: true, > DataFilters: [isnotnull(a#6)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: > struct > : +- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), > [id=#87] > :+- *(1) Project [b#10] > : +- *(1) Filter (isnotnull(b#10) AND (b#10 = 1)) > : +- *(1) ColumnarToRow > : +- FileScan parquet default.t2[b#10] Batched: > true, DataFilters: [isnotnull(b#10), (b#10 = 1)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(b), EqualTo(b,1)], > ReadSchema: struct > +- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), > [id=#96] >+- *(2) Project [c#14] > +- *(2) Filter (isnotnull(c#14) AND (c#14 = 1)) > +- *(2) ColumnarToRow > +- FileScan parquet default.t3[c#14] Batched: true, > DataFilters: [isnotnull(c#14), (c#14 = 1)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(c), EqualTo(c,1)], > ReadSchema: struct > Time taken: 3.785 seconds, Fetched 1 row(s) > {noformat} > Spark 2.2.x: > {noformat} > == Physical Plan == > *HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition >+- *HashAggregate(keys=[], functions=[partial_count(1)]) > +- *Project > +- *SortMergeJoin [b#19], [c#23], Inner > :- *Project [b#19] > : +- *SortMergeJoin [a#15], [b#19], Inner > : :- *Sort [a#15 ASC NULLS FIRST], false, 0 > : : +- Exchange hashpartitioning(a#15, 200) > : : +- *Filter (isnotnull(a#15) && (a#15 = 1)) > : :+- HiveTableScan [a#15], HiveTableRelation > `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#15, > b#16, c#17] > : +- *Sort [b#19 ASC NULLS FIRST], false, 0 > :+- Exchange hashpartitioning(b#19, 200) > : +- *Filter (isnotnull(b#19) && (b#19 = 1)) > : +- HiveTableScan [b#19], HiveTableRelation > `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#18, > b#19, c#20] > +- *Sort [c#23 ASC NULLS FIRST], false, 0 >+- Exchange hashpartitioning(c#23, 200) > +- *Filter (isnotnull(c#23) && (c#23 = 1)) > +- HiveTableScan [c#23], HiveTableRelation > `default`.`t3`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#21, > b#22, c#23] > Time taken: 0.728 seconds, Fetched 1 row(s) > {noformat} > Spark 2.2 can infer {{(a#15 = 1)}}, but Spark 2.3+ can't. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands,
[jira] [Updated] (SPARK-32364) `path` argument of DataFrame.load/save should override the existing options
[ https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32364: -- Description: Although we introduced CaseInsensitiveMap and CaseInsensitiveStringMap(in DSv2), when a user have multiple options like `path`, `paTH`, and `PATH` for the same key `path`, `option()/options()` are non-deterministic because `extraOptions` is `HashMap`. This issue aims to make load/save respect its direct path argument always and ignore the existing options. It's because that load/save function is independent from users' typos like `paTH` and is designed to be invoked at the last operation. So, load/save should work consistently and correctly always. Please note that this doesn't aim to enforce case-insensitivity to `option()/options()` or `extraOptions` variable because that might be considered as a behavior change. {code} spark.read .option("paTh", "1") .option("PATH", "2") .option("Path", "3") .option("patH", "4") .load("5") ... org.apache.spark.sql.AnalysisException: Path does not exist: file:/.../1; {code} Since Apache Spark uses `extraOptions.toMap`, `LinkedHashMap[String, String]` has the same issue. {code} val extraOptions = new scala.collection.mutable.LinkedHashMap[String, String] extraOptions += ("paTh" -> "1") extraOptions += ("PATH" -> "2") extraOptions += ("Path" -> "3") extraOptions += ("patH" -> "4") extraOptions += ("path" -> "5") extraOptions.toMap // Exiting paste mode, now interpreting. extraOptions: scala.collection.mutable.LinkedHashMap[String,String] = Map(paTh -> 1, PATH -> 2, Path -> 3, patH -> 4, path -> 5) res0: scala.collection.immutable.Map[String,String] = Map(PATH -> 2, path -> 5, patH -> 4, Path -> 3, paTh -> 1) {code} was: Although we introduced CaseInsensitiveMap and CaseInsensitiveStringMap(in DSv2), when a user have multiple options like `path`, `paTH`, and `PATH` for the same key `path`, `option()/options()` are non-deterministic because `extraOptions` is `HashMap`. This issue aims to make load/save respect its direct path argument always and ignore the existing options. It's because that load/save function is independent from users' typos like `paTH` and is designed to be invoked at the last operation. So, load/save should work consistently and correctly always. Please note that this doesn't aim to enforce case-insensitivity to `option()/options()` or `extraOptions` variable because that might be considered as a behavior change. {code} spark.read .option("paTh", "1") .option("PATH", "2") .option("Path", "3") .option("patH", "4") .load("5") ... org.apache.spark.sql.AnalysisException: Path does not exist: file:/.../1; {code} > `path` argument of DataFrame.load/save should override the existing options > --- > > Key: SPARK-32364 > URL: https://issues.apache.org/jira/browse/SPARK-32364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Although we introduced CaseInsensitiveMap and CaseInsensitiveStringMap(in > DSv2), when a user have multiple options like `path`, `paTH`, and `PATH` for > the same key `path`, `option()/options()` are non-deterministic because > `extraOptions` is `HashMap`. This issue aims to make load/save respect its > direct path argument always and ignore the existing options. It's because > that load/save function is independent from users' typos like `paTH` and is > designed to be invoked at the last operation. So, load/save should work > consistently and correctly always. > Please note that this doesn't aim to enforce case-insensitivity to > `option()/options()` or `extraOptions` variable because that might be > considered as a behavior change. > {code} > spark.read > .option("paTh", "1") > .option("PATH", "2") > .option("Path", "3") > .option("patH", "4") > .load("5") > ... > org.apache.spark.sql.AnalysisException: > Path does not exist: file:/.../1; > {code} > Since Apache Spark uses `extraOptions.toMap`, `LinkedHashMap[String, String]` > has the same issue. > {code} > val extraOptions = new scala.collection.mutable.LinkedHashMap[String, String] > extraOptions += ("paTh" -> "1") > extraOptions += ("PATH" -> "2") > extraOptions += ("Path" -> "3") > extraOptions += ("patH" -> "4") > extraOptions += ("path" -> "5") > extraOptions.toMap > // Exiting paste mode, now interpreting. > extraOptions: scala.collection.mutable.LinkedHashMap[String,String] = > Map(paTh -> 1, PATH -> 2, Path -> 3, patH -> 4, path -> 5) > res0: scala.collection.immutable.Map[String,String] = Map(PATH -> 2, path -> > 5, patH -> 4, Path -> 3, paTh -> 1) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (SPARK-32376) Make unionByName null-filling behavior work with struct columns
Mukul Murthy created SPARK-32376: Summary: Make unionByName null-filling behavior work with struct columns Key: SPARK-32376 URL: https://issues.apache.org/jira/browse/SPARK-32376 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Mukul Murthy https://issues.apache.org/jira/browse/SPARK-29358 added support for unionByName to work when the two datasets didn't necessarily have the same schema, but it does not work with nested columns like structs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32368) Options in PartitioningAwareFileIndex should respect case insensitivity
[ https://issues.apache.org/jira/browse/SPARK-32368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32368. --- Fix Version/s: 3.1.0 3.0.1 Resolution: Fixed Issue resolved by pull request 29165 [https://github.com/apache/spark/pull/29165] > Options in PartitioningAwareFileIndex should respect case insensitivity > --- > > Key: SPARK-32368 > URL: https://issues.apache.org/jira/browse/SPARK-32368 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 3.0.1, 3.1.0 > > > The datasource options such as {{recursiveFileLookup}} or {{pathglobFilter}} > currently don't respect case insensitivity. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32368) Options in PartitioningAwareFileIndex should respect case insensitivity
[ https://issues.apache.org/jira/browse/SPARK-32368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-32368: - Assignee: Hyukjin Kwon > Options in PartitioningAwareFileIndex should respect case insensitivity > --- > > Key: SPARK-32368 > URL: https://issues.apache.org/jira/browse/SPARK-32368 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > > The datasource options such as {{recursiveFileLookup}} or {{pathglobFilter}} > currently don't respect case insensitivity. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32367) Fix typo of parameter in KubernetesTestComponents
[ https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32367: -- Affects Version/s: 2.4.6 > Fix typo of parameter in KubernetesTestComponents > - > > Key: SPARK-32367 > URL: https://issues.apache.org/jira/browse/SPARK-32367 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.6, 3.0.0 >Reporter: merrily01 >Assignee: merrily01 >Priority: Trivial > Fix For: 2.4.7, 3.0.1, 3.1.0 > > > Correct the spelling of parameter 'spark.executor.instances' in > KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32367) Fix typo of parameter in KubernetesTestComponents
[ https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32367: -- Summary: Fix typo of parameter in KubernetesTestComponents (was: Correct the spelling of parameter in KubernetesTestComponents) > Fix typo of parameter in KubernetesTestComponents > - > > Key: SPARK-32367 > URL: https://issues.apache.org/jira/browse/SPARK-32367 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: merrily01 >Assignee: merrily01 >Priority: Trivial > Fix For: 2.4.7, 3.0.1, 3.1.0 > > > Correct the spelling of parameter 'spark.executor.instances' in > KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32367) Fix typo of parameter in KubernetesTestComponents
[ https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32367: -- Issue Type: Bug (was: Improvement) > Fix typo of parameter in KubernetesTestComponents > - > > Key: SPARK-32367 > URL: https://issues.apache.org/jira/browse/SPARK-32367 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: merrily01 >Assignee: merrily01 >Priority: Trivial > Fix For: 2.4.7, 3.0.1, 3.1.0 > > > Correct the spelling of parameter 'spark.executor.instances' in > KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents
[ https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-32367: - Assignee: merrily01 > Correct the spelling of parameter in KubernetesTestComponents > - > > Key: SPARK-32367 > URL: https://issues.apache.org/jira/browse/SPARK-32367 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: merrily01 >Assignee: merrily01 >Priority: Trivial > Fix For: 3.1.0 > > > Correct the spelling of parameter 'spark.executor.instances' in > KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents
[ https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32367. --- Fix Version/s: 2.4.7 3.0.1 Resolution: Fixed Issue resolved by pull request 29164 [https://github.com/apache/spark/pull/29164] > Correct the spelling of parameter in KubernetesTestComponents > - > > Key: SPARK-32367 > URL: https://issues.apache.org/jira/browse/SPARK-32367 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: merrily01 >Assignee: merrily01 >Priority: Trivial > Fix For: 3.0.1, 2.4.7, 3.1.0 > > > Correct the spelling of parameter 'spark.executor.instances' in > KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32357) Investigate test result reporter integration
[ https://issues.apache.org/jira/browse/SPARK-32357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32357: Assignee: (was: Apache Spark) > Investigate test result reporter integration > > > Key: SPARK-32357 > URL: https://issues.apache.org/jira/browse/SPARK-32357 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > Currently, the readability in the logs are not really good. For example, see > https://pipelines.actions.githubusercontent.com/gik0C3if0ep5i8iNpgFlcJRQk9UyifmoD6XvJANMVttkEP5xje/_apis/pipelines/1/runs/564/signedlogcontent/4?urlExpires=2020-07-09T14%3A05%3A52.5110439Z=HMACV1=gMGczJ8vtNPeQFE0GpjMxSS1BGq14RJLXUfjsLnaX7s%3D > Maybe we should have a way to report the results in an easy way to read. For > example, Jenkins test report-like feature. > We should maybe also take a look for > https://github.com/check-run-reporter/action. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32357) Investigate test result reporter integration
[ https://issues.apache.org/jira/browse/SPARK-32357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32357: Assignee: Apache Spark > Investigate test result reporter integration > > > Key: SPARK-32357 > URL: https://issues.apache.org/jira/browse/SPARK-32357 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > Currently, the readability in the logs are not really good. For example, see > https://pipelines.actions.githubusercontent.com/gik0C3if0ep5i8iNpgFlcJRQk9UyifmoD6XvJANMVttkEP5xje/_apis/pipelines/1/runs/564/signedlogcontent/4?urlExpires=2020-07-09T14%3A05%3A52.5110439Z=HMACV1=gMGczJ8vtNPeQFE0GpjMxSS1BGq14RJLXUfjsLnaX7s%3D > Maybe we should have a way to report the results in an easy way to read. For > example, Jenkins test report-like feature. > We should maybe also take a look for > https://github.com/check-run-reporter/action. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32357) Investigate test result reporter integration
[ https://issues.apache.org/jira/browse/SPARK-32357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161505#comment-17161505 ] Apache Spark commented on SPARK-32357: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/29169 > Investigate test result reporter integration > > > Key: SPARK-32357 > URL: https://issues.apache.org/jira/browse/SPARK-32357 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > Currently, the readability in the logs are not really good. For example, see > https://pipelines.actions.githubusercontent.com/gik0C3if0ep5i8iNpgFlcJRQk9UyifmoD6XvJANMVttkEP5xje/_apis/pipelines/1/runs/564/signedlogcontent/4?urlExpires=2020-07-09T14%3A05%3A52.5110439Z=HMACV1=gMGczJ8vtNPeQFE0GpjMxSS1BGq14RJLXUfjsLnaX7s%3D > Maybe we should have a way to report the results in an easy way to read. For > example, Jenkins test report-like feature. > We should maybe also take a look for > https://github.com/check-run-reporter/action. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32375) Implement TableCatalog for JDBC
[ https://issues.apache.org/jira/browse/SPARK-32375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32375: Assignee: (was: Apache Spark) > Implement TableCatalog for JDBC > --- > > Key: SPARK-32375 > URL: https://issues.apache.org/jira/browse/SPARK-32375 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Implement the TableCatalog interface, in particular: > - list tables > - table exists > - drop table > - rename table > - Optionally, alter table > - Optionally, load table > - Optionally, create table -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32375) Implement TableCatalog for JDBC
[ https://issues.apache.org/jira/browse/SPARK-32375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161500#comment-17161500 ] Apache Spark commented on SPARK-32375: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/29168 > Implement TableCatalog for JDBC > --- > > Key: SPARK-32375 > URL: https://issues.apache.org/jira/browse/SPARK-32375 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Implement the TableCatalog interface, in particular: > - list tables > - table exists > - drop table > - rename table > - Optionally, alter table > - Optionally, load table > - Optionally, create table -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32375) Implement TableCatalog for JDBC
[ https://issues.apache.org/jira/browse/SPARK-32375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32375: Assignee: Apache Spark > Implement TableCatalog for JDBC > --- > > Key: SPARK-32375 > URL: https://issues.apache.org/jira/browse/SPARK-32375 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > Implement the TableCatalog interface, in particular: > - list tables > - table exists > - drop table > - rename table > - Optionally, alter table > - Optionally, load table > - Optionally, create table -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32375) Implement TableCatalog for JDBC
Maxim Gekk created SPARK-32375: -- Summary: Implement TableCatalog for JDBC Key: SPARK-32375 URL: https://issues.apache.org/jira/browse/SPARK-32375 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Implement the TableCatalog interface, in particular: - list tables - table exists - drop table - rename table - Optionally, alter table - Optionally, load table - Optionally, create table -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32374) Disallow setting properties when creating temporary views
[ https://issues.apache.org/jira/browse/SPARK-32374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161478#comment-17161478 ] Apache Spark commented on SPARK-32374: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/29167 > Disallow setting properties when creating temporary views > - > > Key: SPARK-32374 > URL: https://issues.apache.org/jira/browse/SPARK-32374 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Priority: Major > > Currently, you can specify properties when creating a temporary view. > However, they are not used and SHOW TBLPROPERTIES always returns an empty > result on temporary views. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32374) Disallow setting properties when creating temporary views
[ https://issues.apache.org/jira/browse/SPARK-32374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32374: Assignee: Apache Spark > Disallow setting properties when creating temporary views > - > > Key: SPARK-32374 > URL: https://issues.apache.org/jira/browse/SPARK-32374 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Assignee: Apache Spark >Priority: Major > > Currently, you can specify properties when creating a temporary view. > However, they are not used and SHOW TBLPROPERTIES always returns an empty > result on temporary views. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32374) Disallow setting properties when creating temporary views
[ https://issues.apache.org/jira/browse/SPARK-32374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32374: Assignee: (was: Apache Spark) > Disallow setting properties when creating temporary views > - > > Key: SPARK-32374 > URL: https://issues.apache.org/jira/browse/SPARK-32374 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Priority: Major > > Currently, you can specify properties when creating a temporary view. > However, they are not used and SHOW TBLPROPERTIES always returns an empty > result on temporary views. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32374) Disallow setting properties when creating temporary views
[ https://issues.apache.org/jira/browse/SPARK-32374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161479#comment-17161479 ] Apache Spark commented on SPARK-32374: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/29167 > Disallow setting properties when creating temporary views > - > > Key: SPARK-32374 > URL: https://issues.apache.org/jira/browse/SPARK-32374 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Priority: Major > > Currently, you can specify properties when creating a temporary view. > However, they are not used and SHOW TBLPROPERTIES always returns an empty > result on temporary views. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32374) Disallow setting properties when creating temporary views
[ https://issues.apache.org/jira/browse/SPARK-32374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32374: Assignee: Apache Spark > Disallow setting properties when creating temporary views > - > > Key: SPARK-32374 > URL: https://issues.apache.org/jira/browse/SPARK-32374 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Assignee: Apache Spark >Priority: Major > > Currently, you can specify properties when creating a temporary view. > However, they are not used and SHOW TBLPROPERTIES always returns an empty > result on temporary views. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32374) Disallow setting properties when creating temporary views
Terry Kim created SPARK-32374: - Summary: Disallow setting properties when creating temporary views Key: SPARK-32374 URL: https://issues.apache.org/jira/browse/SPARK-32374 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Terry Kim Currently, you can specify properties when creating a temporary view. However, they are not used and SHOW TBLPROPERTIES always returns an empty result on temporary views. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32373) Spark Standalone - RetryingBlockFetcher tries to get block from worker even 10mins after it was de-registered from spark cluster
t oo created SPARK-32373: Summary: Spark Standalone - RetryingBlockFetcher tries to get block from worker even 10mins after it was de-registered from spark cluster Key: SPARK-32373 URL: https://issues.apache.org/jira/browse/SPARK-32373 Project: Spark Issue Type: Bug Components: Block Manager, Scheduler, Shuffle, Spark Core Affects Versions: 2.4.6 Reporter: t oo Using spark standalone in 2.4.6 with spot ec2 instances, the .242 IP instance was terminated at 12:00:11pm, before then it appeared registered in Spark UI as ALIVE for few hours, it then appeared in Spark UI as DEAD until 12:16pm, then it disappeared from Spark UI completely. An app that started at 11:24am had below error. As you can see in below app log from another worker it is still trying to get shuffle block from .242 IP at 12:10pm (10mins after the worker was removed from the spark cluster). I would expect that once within 2mins of the worker being removed from the cluster that it would stop retrying {code:java} 2020-07-20 12:10:02,702 [Block Fetch Retry-9-3] ERROR org.apache.spark.network.shuffle.RetryingBlockFetcher - Exception while beginning fetch of 1 outstanding blocks (after 3 retries) java.io.IOException: Connecting to /redact.242:7337 timed out (12 ms) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:243) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187) at org.apache.spark.network.shuffle.ExternalShuffleClient.lambda$fetchBlocks$0(ExternalShuffleClient.java:100) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141) at org.apache.spark.network.shuffle.RetryingBlockFetcher.lambda$initiateRetry$0(RetryingBlockFetcher.java:169) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) 2020-07-20 12:07:57,700 [Block Fetch Retry-9-2] ERROR org.apache.spark.network.shuffle.RetryingBlockFetcher - Exception while beginning fetch of 1 outstanding blocks (after 2 retries) java.io.IOException: Connecting to /redact.242:7337 timed out (12 ms) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:243) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187) at org.apache.spark.network.shuffle.ExternalShuffleClient.lambda$fetchBlocks$0(ExternalShuffleClient.java:100) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141) at org.apache.spark.network.shuffle.RetryingBlockFetcher.lambda$initiateRetry$0(RetryingBlockFetcher.java:169) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) 2020-07-20 12:05:52,697 [Block Fetch Retry-9-1] ERROR org.apache.spark.network.shuffle.RetryingBlockFetcher - Exception while beginning fetch of 1 outstanding blocks (after 1 retries) java.io.IOException: Connecting to /redact.242:7337 timed out (12 ms) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:243) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187) at org.apache.spark.network.shuffle.ExternalShuffleClient.lambda$fetchBlocks$0(ExternalShuffleClient.java:100) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141) at org.apache.spark.network.shuffle.RetryingBlockFetcher.lambda$initiateRetry$0(RetryingBlockFetcher.java:169) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at
[jira] [Commented] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references
[ https://issues.apache.org/jira/browse/SPARK-32372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161334#comment-17161334 ] Apache Spark commented on SPARK-32372: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/29166 > "Resolved attribute(s) XXX missing" after dudup conflict references > --- > > Key: SPARK-32372 > URL: https://issues.apache.org/jira/browse/SPARK-32372 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.2, 2.3.4, 2.4.6, 3.0.0 >Reporter: wuyi >Priority: Blocker > > {code:java} > // case class Person(id: Int, name: String, age: Int) > sql("SELECT name, avg(age) as avg_age FROM person GROUP BY > name").createOrReplaceTempView("person_a") > sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = > p2.name").createOrReplaceTempView("person_b") > sql("SELECT * FROM person_a UNION SELECT * FROM person_b") > .createOrReplaceTempView("person_c") > sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name > = p2.name").show > {code} > error: > {code:java} > [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: > Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in > operator !Project [name#233, avg_age#235]. Attribute(s) with the same name > appear in the operation: avg_age. Please check if the right attribute(s) are > used.;; > ...{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references
[ https://issues.apache.org/jira/browse/SPARK-32372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32372: Assignee: (was: Apache Spark) > "Resolved attribute(s) XXX missing" after dudup conflict references > --- > > Key: SPARK-32372 > URL: https://issues.apache.org/jira/browse/SPARK-32372 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.2, 2.3.4, 2.4.6, 3.0.0 >Reporter: wuyi >Priority: Blocker > > {code:java} > // case class Person(id: Int, name: String, age: Int) > sql("SELECT name, avg(age) as avg_age FROM person GROUP BY > name").createOrReplaceTempView("person_a") > sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = > p2.name").createOrReplaceTempView("person_b") > sql("SELECT * FROM person_a UNION SELECT * FROM person_b") > .createOrReplaceTempView("person_c") > sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name > = p2.name").show > {code} > error: > {code:java} > [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: > Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in > operator !Project [name#233, avg_age#235]. Attribute(s) with the same name > appear in the operation: avg_age. Please check if the right attribute(s) are > used.;; > ...{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references
[ https://issues.apache.org/jira/browse/SPARK-32372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32372: Assignee: Apache Spark > "Resolved attribute(s) XXX missing" after dudup conflict references > --- > > Key: SPARK-32372 > URL: https://issues.apache.org/jira/browse/SPARK-32372 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.2, 2.3.4, 2.4.6, 3.0.0 >Reporter: wuyi >Assignee: Apache Spark >Priority: Blocker > > {code:java} > // case class Person(id: Int, name: String, age: Int) > sql("SELECT name, avg(age) as avg_age FROM person GROUP BY > name").createOrReplaceTempView("person_a") > sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = > p2.name").createOrReplaceTempView("person_b") > sql("SELECT * FROM person_a UNION SELECT * FROM person_b") > .createOrReplaceTempView("person_c") > sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name > = p2.name").show > {code} > error: > {code:java} > [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: > Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in > operator !Project [name#233, avg_age#235]. Attribute(s) with the same name > appear in the operation: avg_age. Please check if the right attribute(s) are > used.;; > ...{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references
[ https://issues.apache.org/jira/browse/SPARK-32372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161333#comment-17161333 ] Apache Spark commented on SPARK-32372: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/29166 > "Resolved attribute(s) XXX missing" after dudup conflict references > --- > > Key: SPARK-32372 > URL: https://issues.apache.org/jira/browse/SPARK-32372 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.2, 2.3.4, 2.4.6, 3.0.0 >Reporter: wuyi >Priority: Blocker > > {code:java} > // case class Person(id: Int, name: String, age: Int) > sql("SELECT name, avg(age) as avg_age FROM person GROUP BY > name").createOrReplaceTempView("person_a") > sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = > p2.name").createOrReplaceTempView("person_b") > sql("SELECT * FROM person_a UNION SELECT * FROM person_b") > .createOrReplaceTempView("person_c") > sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name > = p2.name").show > {code} > error: > {code:java} > [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: > Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in > operator !Project [name#233, avg_age#235]. Attribute(s) with the same name > appear in the operation: avg_age. Please check if the right attribute(s) are > used.;; > ...{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references
wuyi created SPARK-32372: Summary: "Resolved attribute(s) XXX missing" after dudup conflict references Key: SPARK-32372 URL: https://issues.apache.org/jira/browse/SPARK-32372 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0, 2.4.6, 2.3.4, 2.2.2 Reporter: wuyi {code:java} // case class Person(id: Int, name: String, age: Int) sql("SELECT name, avg(age) as avg_age FROM person GROUP BY name").createOrReplaceTempView("person_a") sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = p2.name").createOrReplaceTempView("person_b") sql("SELECT * FROM person_a UNION SELECT * FROM person_b") .createOrReplaceTempView("person_c") sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name = p2.name").show {code} error: {code:java} [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in operator !Project [name#233, avg_age#235]. Attribute(s) with the same name appear in the operation: avg_age. Please check if the right attribute(s) are used.;; ...{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32330) Preserve shuffled hash join build side partitioning
[ https://issues.apache.org/jira/browse/SPARK-32330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-32330: --- Assignee: Cheng Su > Preserve shuffled hash join build side partitioning > --- > > Key: SPARK-32330 > URL: https://issues.apache.org/jira/browse/SPARK-32330 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Trivial > Fix For: 3.1.0 > > > Currently `ShuffledHashJoin.outputPartitioning` inherits from > `HashJoin.outputPartitioning`, which only preserves stream side partitioning: > `HashJoin.scala` > {code:java} > override def outputPartitioning: Partitioning = > streamedPlan.outputPartitioning > {code} > This loses build side partitioning information, and causes extra shuffle if > there's another join / group-by after this join. > Example: > > {code:java} > // code placeholder > withSQLConf( > SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "50", > SQLConf.SHUFFLE_PARTITIONS.key -> "2", > SQLConf.PREFER_SORTMERGEJOIN.key -> "false") { > val df1 = spark.range(10).select($"id".as("k1")) > val df2 = spark.range(30).select($"id".as("k2")) > Seq("inner", "cross").foreach(joinType => { > val plan = df1.join(df2, $"k1" === $"k2", joinType).groupBy($"k1").count() > .queryExecution.executedPlan > assert(plan.collect { case _: ShuffledHashJoinExec => true }.size === 1) > // No extra shuffle before aggregate > assert(plan.collect { case _: ShuffleExchangeExec => true }.size === 2) > }) > }{code} > > Current physical plan (having an extra shuffle on `k1` before aggregate) > > {code:java} > *(4) HashAggregate(keys=[k1#220L], functions=[count(1)], output=[k1#220L, > count#235L]) > +- Exchange hashpartitioning(k1#220L, 2), true, [id=#117] >+- *(3) HashAggregate(keys=[k1#220L], functions=[partial_count(1)], > output=[k1#220L, count#239L]) > +- *(3) Project [k1#220L] > +- ShuffledHashJoin [k1#220L], [k2#224L], Inner, BuildLeft > :- Exchange hashpartitioning(k1#220L, 2), true, [id=#109] > : +- *(1) Project [id#218L AS k1#220L] > : +- *(1) Range (0, 10, step=1, splits=2) > +- Exchange hashpartitioning(k2#224L, 2), true, [id=#111] >+- *(2) Project [id#222L AS k2#224L] > +- *(2) Range (0, 30, step=1, splits=2){code} > > Ideal physical plan (no shuffle on `k1` before aggregate) > {code:java} > *(3) HashAggregate(keys=[k1#220L], functions=[count(1)], output=[k1#220L, > count#235L]) > +- *(3) HashAggregate(keys=[k1#220L], functions=[partial_count(1)], > output=[k1#220L, count#239L]) >+- *(3) Project [k1#220L] > +- ShuffledHashJoin [k1#220L], [k2#224L], Inner, BuildLeft > :- Exchange hashpartitioning(k1#220L, 2), true, [id=#107] > : +- *(1) Project [id#218L AS k1#220L] > : +- *(1) Range (0, 10, step=1, splits=2) > +- Exchange hashpartitioning(k2#224L, 2), true, [id=#109] > +- *(2) Project [id#222L AS k2#224L] >+- *(2) Range (0, 30, step=1, splits=2){code} > > This can be fixed by overriding `outputPartitioning` method in > `ShuffledHashJoinExec`, similar to `SortMergeJoinExec`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32330) Preserve shuffled hash join build side partitioning
[ https://issues.apache.org/jira/browse/SPARK-32330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-32330. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29130 [https://github.com/apache/spark/pull/29130] > Preserve shuffled hash join build side partitioning > --- > > Key: SPARK-32330 > URL: https://issues.apache.org/jira/browse/SPARK-32330 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Cheng Su >Priority: Trivial > Fix For: 3.1.0 > > > Currently `ShuffledHashJoin.outputPartitioning` inherits from > `HashJoin.outputPartitioning`, which only preserves stream side partitioning: > `HashJoin.scala` > {code:java} > override def outputPartitioning: Partitioning = > streamedPlan.outputPartitioning > {code} > This loses build side partitioning information, and causes extra shuffle if > there's another join / group-by after this join. > Example: > > {code:java} > // code placeholder > withSQLConf( > SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "50", > SQLConf.SHUFFLE_PARTITIONS.key -> "2", > SQLConf.PREFER_SORTMERGEJOIN.key -> "false") { > val df1 = spark.range(10).select($"id".as("k1")) > val df2 = spark.range(30).select($"id".as("k2")) > Seq("inner", "cross").foreach(joinType => { > val plan = df1.join(df2, $"k1" === $"k2", joinType).groupBy($"k1").count() > .queryExecution.executedPlan > assert(plan.collect { case _: ShuffledHashJoinExec => true }.size === 1) > // No extra shuffle before aggregate > assert(plan.collect { case _: ShuffleExchangeExec => true }.size === 2) > }) > }{code} > > Current physical plan (having an extra shuffle on `k1` before aggregate) > > {code:java} > *(4) HashAggregate(keys=[k1#220L], functions=[count(1)], output=[k1#220L, > count#235L]) > +- Exchange hashpartitioning(k1#220L, 2), true, [id=#117] >+- *(3) HashAggregate(keys=[k1#220L], functions=[partial_count(1)], > output=[k1#220L, count#239L]) > +- *(3) Project [k1#220L] > +- ShuffledHashJoin [k1#220L], [k2#224L], Inner, BuildLeft > :- Exchange hashpartitioning(k1#220L, 2), true, [id=#109] > : +- *(1) Project [id#218L AS k1#220L] > : +- *(1) Range (0, 10, step=1, splits=2) > +- Exchange hashpartitioning(k2#224L, 2), true, [id=#111] >+- *(2) Project [id#222L AS k2#224L] > +- *(2) Range (0, 30, step=1, splits=2){code} > > Ideal physical plan (no shuffle on `k1` before aggregate) > {code:java} > *(3) HashAggregate(keys=[k1#220L], functions=[count(1)], output=[k1#220L, > count#235L]) > +- *(3) HashAggregate(keys=[k1#220L], functions=[partial_count(1)], > output=[k1#220L, count#239L]) >+- *(3) Project [k1#220L] > +- ShuffledHashJoin [k1#220L], [k2#224L], Inner, BuildLeft > :- Exchange hashpartitioning(k1#220L, 2), true, [id=#107] > : +- *(1) Project [id#218L AS k1#220L] > : +- *(1) Range (0, 10, step=1, splits=2) > +- Exchange hashpartitioning(k2#224L, 2), true, [id=#109] > +- *(2) Project [id#222L AS k2#224L] >+- *(2) Range (0, 30, step=1, splits=2){code} > > This can be fixed by overriding `outputPartitioning` method in > `ShuffledHashJoinExec`, similar to `SortMergeJoinExec`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31869) BroadcastHashJoinExe's outputPartitioning can utilize the build side
[ https://issues.apache.org/jira/browse/SPARK-31869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31869. - Fix Version/s: 3.1.0 Assignee: Terry Kim Resolution: Fixed > BroadcastHashJoinExe's outputPartitioning can utilize the build side > > > Key: SPARK-31869 > URL: https://issues.apache.org/jira/browse/SPARK-31869 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Minor > Fix For: 3.1.0 > > > Currently, the BroadcastHashJoinExec's outputPartitioning only uses the > streamed side's outputPartitioning. Thus, if the join key is from the build > side for the join where one side is BroadcastHashJoinExec: > {code:java} > spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "500") > val t1 = (0 until 100).map(i => (i % 5, i % 13)).toDF("i1", "j1") > val t2 = (0 until 100).map(i => (i % 5, i % 13)).toDF("i2", "j2") > val t3 = (0 until 20).map(i => (i % 7, i % 11)).toDF("i3", "j3") > val t4 = (0 until 100).map(i => (i % 5, i % 13)).toDF("i4", "j4") > // join1 is a sort merge join. > val join1 = t1.join(t2, t1("i1") === t2("i2")) > // join2 is a broadcast join where t3 is broadcasted. > val join2 = join1.join(t3, join1("i1") === t3("i3")) > // Join on the column from the broadcasted side (i3). > val join3 = join2.join(t4, join2("i3") === t4("i4")) > join3.explain > {code} > it produces Exchange hashpartitioning(i2#103, 200): > {code:java} > == Physical Plan == > *(6) SortMergeJoin [i3#29], [i4#40], Inner > :- *(4) Sort [i3#29 ASC NULLS FIRST], false, 0 > : +- Exchange hashpartitioning(i3#29, 200), true, [id=#55] > : +- *(3) BroadcastHashJoin [i1#7], [i3#29], Inner, BuildRight > ::- *(3) SortMergeJoin [i1#7], [i2#18], Inner > :: :- *(1) Sort [i1#7 ASC NULLS FIRST], false, 0 > :: : +- Exchange hashpartitioning(i1#7, 200), true, [id=#28] > :: : +- LocalTableScan [i1#7, j1#8] > :: +- *(2) Sort [i2#18 ASC NULLS FIRST], false, 0 > :: +- Exchange hashpartitioning(i2#18, 200), true, [id=#29] > ::+- LocalTableScan [i2#18, j2#19] > :+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, > int, false] as bigint))), [id=#34] > : +- LocalTableScan [i3#29, j3#30] > +- *(5) Sort [i4#40 ASC NULLS FIRST], false, 0 >+- Exchange hashpartitioning(i4#40, 200), true, [id=#39] > +- LocalTableScan [i4#40, j4#41] > {code} > But, since BroadcastHashJoinExec is only for equi-join, if the streamed side > has HashPartitioning, BroadcastHashJoinExec can utilize the info to eliminate > the exchange. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32302) Partially push down disjunctive predicates through Join/Partitions
[ https://issues.apache.org/jira/browse/SPARK-32302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-32302. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29101 [https://github.com/apache/spark/pull/29101] > Partially push down disjunctive predicates through Join/Partitions > -- > > Key: SPARK-32302 > URL: https://issues.apache.org/jira/browse/SPARK-32302 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.1.0 > > > In https://github.com/apache/spark/pull/28733, CNF conversion is used to push > down disjunctive predicates through join. > It's a good improvement, however, > 1. converting all the predicates in CNF can lead to a very long result, even > with grouping functions over expressions. > 2.the non-recursive is not easy for understanding. > Essentially, we just need to traverse predicate and extract the convertible > sub-predicates like what we did in > https://github.com/apache/spark/pull/24598. There is no need to maintain the > CNF result set. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32324) Fix error messages during using PIVOT and lateral view
[ https://issues.apache.org/jira/browse/SPARK-32324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] philipse resolved SPARK-32324. -- Resolution: Not A Problem > Fix error messages during using PIVOT and lateral view > -- > > Key: SPARK-32324 > URL: https://issues.apache.org/jira/browse/SPARK-32324 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: philipse >Priority: Minor > > Currently when we use `lateral view` and `pivot` together in from clause, if > `lateral view` is before `pivot`, the error message is "LATERAL cannot be > used together with PIVOT in FROM clause".if if `lateral view` is after > `pivot`,the query will be normal ,So the error messages "LATERAL cannot be > used together with PIVOT in FROM clause" is not accurate, we may improve it. > > Steps to reproduce: > {code:java} > CREATE TABLE person (id INT, name STRING, age INT, class int, address STRING); > INSERT INTO person VALUES > (100, 'John', 30, 1, 'Street 1'), > (200, 'Mary', NULL, 1, 'Street 2'), > (300, 'Mike', 80, 3, 'Street 3'), > (400, 'Dan', 50, 4, 'Street 4'); > {code} > > Query1: > > {code:java} > SELECT * FROM person > lateral view outer explode(array(30,60)) tabelName as c_age > lateral view explode(array(40,80)) as d_age > PIVOT ( > count(distinct age) as a > for name in ('Mary','John') > ) > {code} > Result 1: > > {code:java} > Error: org.apache.spark.sql.catalyst.parser.ParseException: > LATERAL cannot be used together with PIVOT in FROM clause(line 1, pos 9) > == SQL == > SELECT * FROM person > -^^^ > lateral view outer explode(array(30,60)) tabelName as c_age > lateral view explode(array(40,80)) as d_age > PIVOT ( > count(distinct age) as a > for name in ('Mary','John') > ) (state=,code=0) > {code} > > > Query2: > > {code:java} > SELECT * FROM person > PIVOT ( > count(distinct age) as a > for name in ('Mary','John') > ) > lateral view outer explode(array(30,60)) tabelName as c_age > lateral view explode(array(40,80)) as d_age > {code} > > Reuslt2: > +---+--++---++ > |id|Mary|John|c_age|d_age| > +---+--++---++ > |300|NULL|NULL|30|40| > |300|NULL|NULL|30|80| > |300|NULL|NULL|60|40| > |300|NULL|NULL|60|80| > |100|0|NULL|30|40| > |100|0|NULL|30|80| > |100|0|NULL|60|40| > |100|0|NULL|60|80| > |400|NULL|NULL|30|40| > |400|NULL|NULL|30|80| > |400|NULL|NULL|60|40| > |400|NULL|NULL|60|80| > |200|NULL|1|30|40| > |200|NULL|1|30|80| > |200|NULL|1|60|40| > |200|NULL|1|60|80| > +---+--++---++ > ``` > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32371) Autodetect persistently failing executor pods and fail the application logging the cause.
Prashant Sharma created SPARK-32371: --- Summary: Autodetect persistently failing executor pods and fail the application logging the cause. Key: SPARK-32371 URL: https://issues.apache.org/jira/browse/SPARK-32371 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.1.0 Reporter: Prashant Sharma {code:java} [root@kyok-test-1 ~]# kubectl get po -w NAME READY STATUS RESTARTS AGE spark-shell-a3962a736bf9e775-exec-36 1/1 Running 0 5s spark-shell-a3962a736bf9e775-exec-37 1/1 Running 0 3s spark-shell-a3962a736bf9e775-exec-36 0/1 Error 0 5s spark-shell-a3962a736bf9e775-exec-38 0/1 Pending 0 1s spark-shell-a3962a736bf9e775-exec-38 0/1 Pending 0 1s spark-shell-a3962a736bf9e775-exec-38 0/1 ContainerCreating 0 1s spark-shell-a3962a736bf9e775-exec-36 0/1 Terminating 0 6s spark-shell-a3962a736bf9e775-exec-36 0/1 Terminating 0 6s spark-shell-a3962a736bf9e775-exec-37 0/1 Error 0 5s spark-shell-a3962a736bf9e775-exec-38 1/1 Running 0 2s spark-shell-a3962a736bf9e775-exec-39 0/1 Pending 0 0s spark-shell-a3962a736bf9e775-exec-39 0/1 Pending 0 0s spark-shell-a3962a736bf9e775-exec-39 0/1 ContainerCreating 0 0s spark-shell-a3962a736bf9e775-exec-37 0/1 Terminating 0 6s spark-shell-a3962a736bf9e775-exec-37 0/1 Terminating 0 6s spark-shell-a3962a736bf9e775-exec-38 0/1 Error 0 4s spark-shell-a3962a736bf9e775-exec-39 1/1 Running 0 1s spark-shell-a3962a736bf9e775-exec-40 0/1 Pending 0 0s spark-shell-a3962a736bf9e775-exec-40 0/1 Pending 0 0s spark-shell-a3962a736bf9e775-exec-40 0/1 ContainerCreating 0 0s spark-shell-a3962a736bf9e775-exec-38 0/1 Terminating 0 5s spark-shell-a3962a736bf9e775-exec-38 0/1 Terminating 0 5s spark-shell-a3962a736bf9e775-exec-39 0/1 Error 0 3s spark-shell-a3962a736bf9e775-exec-40 1/1 Running 0 1s spark-shell-a3962a736bf9e775-exec-41 0/1 Pending 0 0s spark-shell-a3962a736bf9e775-exec-41 0/1 Pending 0 0s spark-shell-a3962a736bf9e775-exec-41 0/1 ContainerCreating 0 0s spark-shell-a3962a736bf9e775-exec-39 0/1 Terminating 0 4s spark-shell-a3962a736bf9e775-exec-39 0/1 Terminating 0 4s spark-shell-a3962a736bf9e775-exec-41 1/1 Running 0 2s spark-shell-a3962a736bf9e775-exec-40 0/1 Error 0 4s spark-shell-a3962a736bf9e775-exec-42 0/1 Pending 0 0s spark-shell-a3962a736bf9e775-exec-42 0/1 Pending 0 0s spark-shell-a3962a736bf9e775-exec-42 0/1 ContainerCreating 0 0s spark-shell-a3962a736bf9e775-exec-40 0/1 Terminating 0 4s spark-shell-a3962a736bf9e775-exec-40 0/1 Terminating 0 4s {code} A cascade of creating and terminating pods within 3-4 seconds, is created. It is difficult to see the logs of these constantly created and terminated pods. Thankfully, there is an option {code:java} spark.kubernetes.executor.deleteOnTermination false {code} to turn off the auto deletion of executor pods, and gives us opportunity to diagnose the problem. However, this is not turned on by default, and sometimes one may need to guess what caused the problem the previous run and steps to reproduce it and then re run the application with exact same setup to reproduce. So, it might be good, if we could somehow detect this situation, of pod failing as soon as they start or failing on particular task and capture the error that caused the pod to terminate and relay it back to driver and log it. Alternatively, if we could auto-detect this situation, we can also auto stop creating more executor pods and fail with appropriate error also retaining the last failed pod for user's further investigation. So far it is not yet evaluated how this can be achieved, but, this feature might be useful for K8s growing as a preferred choice for deploying spark. Logging this issue for further investigation and work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32370) pyspark foreach/foreachPartition send http request failed
[ https://issues.apache.org/jira/browse/SPARK-32370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Liu updated SPARK-32370: Environment: spark 3.0 python3.7 macos 10.15 > pyspark foreach/foreachPartition send http request failed > - > > Key: SPARK-32370 > URL: https://issues.apache.org/jira/browse/SPARK-32370 > Project: Spark > Issue Type: Question > Components: PySpark >Affects Versions: 3.0.0 > Environment: spark 3.0 > python3.7 > macos 10.15 >Reporter: Tao Liu >Priority: Major > > I use urllib.request to send http request in foreach/foreachPartition. > pyspark throw error as follow:I use urllib.request to send http request in > foreach/foreachPartition. pyspark throw error as follow: > {color:#de350b}_objc[74094]: +[__NSPlaceholderDate initialize] may have been > in progress in another thread when fork() was called. We cannot safely call > it or ignore it in the fork() child process. Crashing instead. Set a > breakpoint on objc_initializeAfterForkError to debug.20/07/20 19:05:58 ERROR > Executor: Exception in task 7.0 in stage 0.0 (TID > 7)org.apache.spark.SparkException: Python worker exited unexpectedly > (crashed) at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:536) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:525) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:643) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:621) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:456) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator.foreach(Iterator.scala:941) at > scala.collection.Iterator.foreach$(Iterator.scala:941) at > org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) > at > scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at > scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) at > org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at > scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at > org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) > at > scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at > org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2133)_{color} > _ > when i call rdd.foreach(send_http), > rdd=sc.parallelize(["http://192.168.1.1:5000/index.html;]), send_http defined > as follow: > _def send_http(url):_ > _req = urllib.request.Request(url)_ > _resp = urllib.request.urlopen(req)_ > anyone can tell me the problem? thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32370) pyspark foreach/foreachPartition send http request failed
Tao Liu created SPARK-32370: --- Summary: pyspark foreach/foreachPartition send http request failed Key: SPARK-32370 URL: https://issues.apache.org/jira/browse/SPARK-32370 Project: Spark Issue Type: Question Components: PySpark Affects Versions: 3.0.0 Reporter: Tao Liu I use urllib.request to send http request in foreach/foreachPartition. pyspark throw error as follow:I use urllib.request to send http request in foreach/foreachPartition. pyspark throw error as follow: {color:#de350b}_objc[74094]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.20/07/20 19:05:58 ERROR Executor: Exception in task 7.0 in stage 0.0 (TID 7)org.apache.spark.SparkException: Python worker exited unexpectedly (crashed) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:536) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:525) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:643) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:621) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:456) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2133)_{color} _ when i call rdd.foreach(send_http), rdd=sc.parallelize(["http://192.168.1.1:5000/index.html;]), send_http defined as follow: _def send_http(url):_ _req = urllib.request.Request(url)_ _resp = urllib.request.urlopen(req)_ anyone can tell me the problem? thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32369) pyspark foreach/foreachPartition send http request failed
Tao Liu created SPARK-32369: --- Summary: pyspark foreach/foreachPartition send http request failed Key: SPARK-32369 URL: https://issues.apache.org/jira/browse/SPARK-32369 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.0.0 Reporter: Tao Liu I use urllib.request to send http request in foreach/foreachPartition. pyspark throw error as follow:I use urllib.request to send http request in foreach/foreachPartition. pyspark throw error as follow: _objc[74094]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.20/07/20 19:05:58 ERROR Executor: Exception in task 7.0 in stage 0.0 (TID 7)org.apache.spark.SparkException: Python worker exited unexpectedly (crashed) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:536) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:525) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:643) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:621) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:456) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2133)_ when i call rdd.foreach(send_http), rdd=sc.parallelize(["http://192.168.1.1:5000/index.html;]), send_http defined as follow: _def send_http(url):_ _req = urllib.request.Request(url)_ _resp = urllib.request.urlopen(req)_ anyone can tell me the problem? thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32366) Fix doc link of datetime pattern in 3.0 migration guide
[ https://issues.apache.org/jira/browse/SPARK-32366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-32366. -- Fix Version/s: 3.1.0 3.0.1 Resolution: Fixed Issue resolved by pull request 29162 [https://github.com/apache/spark/pull/29162] > Fix doc link of datetime pattern in 3.0 migration guide > --- > > Key: SPARK-32366 > URL: https://issues.apache.org/jira/browse/SPARK-32366 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.0.1, 3.1.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.0.1, 3.1.0 > > > In http://spark.apache.org/docs/latest/sql-migration-guide.html#query-engine, > there is a invalid reference for datetime reference > "sql-ref-datetime-pattern.md". We should fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32368) Options in PartitioningAwareFileIndex should respect case insensitivity
[ https://issues.apache.org/jira/browse/SPARK-32368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32368: Assignee: Apache Spark > Options in PartitioningAwareFileIndex should respect case insensitivity > --- > > Key: SPARK-32368 > URL: https://issues.apache.org/jira/browse/SPARK-32368 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Minor > > The datasource options such as {{recursiveFileLookup}} or {{pathglobFilter}} > currently don't respect case insensitivity. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32368) Options in PartitioningAwareFileIndex should respect case insensitivity
[ https://issues.apache.org/jira/browse/SPARK-32368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32368: Assignee: (was: Apache Spark) > Options in PartitioningAwareFileIndex should respect case insensitivity > --- > > Key: SPARK-32368 > URL: https://issues.apache.org/jira/browse/SPARK-32368 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Hyukjin Kwon >Priority: Minor > > The datasource options such as {{recursiveFileLookup}} or {{pathglobFilter}} > currently don't respect case insensitivity. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32368) Options in PartitioningAwareFileIndex should respect case insensitivity
[ https://issues.apache.org/jira/browse/SPARK-32368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161173#comment-17161173 ] Apache Spark commented on SPARK-32368: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29165 > Options in PartitioningAwareFileIndex should respect case insensitivity > --- > > Key: SPARK-32368 > URL: https://issues.apache.org/jira/browse/SPARK-32368 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Hyukjin Kwon >Priority: Minor > > The datasource options such as {{recursiveFileLookup}} or {{pathglobFilter}} > currently don't respect case insensitivity. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32368) Options in PartitioningAwareFileIndex should respect case insensitivity
Hyukjin Kwon created SPARK-32368: Summary: Options in PartitioningAwareFileIndex should respect case insensitivity Key: SPARK-32368 URL: https://issues.apache.org/jira/browse/SPARK-32368 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1 Reporter: Hyukjin Kwon The datasource options such as {{recursiveFileLookup}} or {{pathglobFilter}} currently don't respect case insensitivity. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32240) UDF parse_url with a URL that contains pipe(|) will give incorrect result
[ https://issues.apache.org/jira/browse/SPARK-32240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1716#comment-1716 ] Stanislav commented on SPARK-32240: --- I think the problem is that pipe is not a valid URL symbol. {code:javascript} encodeURI("https://a.b.c/index.php?params1=a|b=x") "https://a.b.c/index.php?params1=a%7Cb=x; {code} It does work in hive's {{parse_url}} UDF, in my knowledge, so I would re-classify this bug as a compatibility issue > UDF parse_url with a URL that contains pipe(|) will give incorrect result > - > > Key: SPARK-32240 > URL: https://issues.apache.org/jira/browse/SPARK-32240 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Kritsada Limpawatkul >Priority: Major > > I try to get the host from the URL with the code below. > {code:java} > Seq( > "https://a.b.c/index.php?params1=a|b=x", > "https://a.b.c/index.php?params1=a; > ) > .toDF("url") > .withColumn("host", callUDF("parse_url", $"url", lit("HOST"))) > .show(false){code} > The result of the code is as follows. > {code:java} > +-+-+ > |url |host | > +-+-+ > |https://a.b.c/index.php?params1=a|b=x|null | > |https://a.b.c/index.php?params1=a|a.b.c| > +-+-+ > {code} > It seems like the host becomes null when the URL contains any pipe(|) > character. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents
[ https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161057#comment-17161057 ] Apache Spark commented on SPARK-32367: -- User 'merrily01' has created a pull request for this issue: https://github.com/apache/spark/pull/29164 > Correct the spelling of parameter in KubernetesTestComponents > - > > Key: SPARK-32367 > URL: https://issues.apache.org/jira/browse/SPARK-32367 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: merrily01 >Priority: Trivial > Fix For: 3.1.0 > > > Correct the spelling of parameter 'spark.executor.instances' in > KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents
[ https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161056#comment-17161056 ] merrily01 commented on SPARK-32367: --- [https://github.com/apache/spark/pull/29164/files] > Correct the spelling of parameter in KubernetesTestComponents > - > > Key: SPARK-32367 > URL: https://issues.apache.org/jira/browse/SPARK-32367 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: merrily01 >Priority: Trivial > Fix For: 3.1.0 > > > Correct the spelling of parameter 'spark.executor.instances' in > KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents
[ https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161059#comment-17161059 ] Apache Spark commented on SPARK-32367: -- User 'merrily01' has created a pull request for this issue: https://github.com/apache/spark/pull/29164 > Correct the spelling of parameter in KubernetesTestComponents > - > > Key: SPARK-32367 > URL: https://issues.apache.org/jira/browse/SPARK-32367 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: merrily01 >Priority: Trivial > Fix For: 3.1.0 > > > Correct the spelling of parameter 'spark.executor.instances' in > KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32358) temp view not working after upgrading from 2.3.3 to 2.4.5
[ https://issues.apache.org/jira/browse/SPARK-32358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] philipse resolved SPARK-32358. -- Resolution: Won't Fix > temp view not working after upgrading from 2.3.3 to 2.4.5 > - > > Key: SPARK-32358 > URL: https://issues.apache.org/jira/browse/SPARK-32358 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.5 >Reporter: philipse >Priority: Major > > After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . > Please correct me if i miss sth. Thanks! > Steps to reproduce: > ``` > from pyspark.sql import SparkSession > from pyspark.sql import Row > spark=SparkSession\ > .builder \ > .appName('scenary_address_1') \ > .enableHiveSupport() \ > .getOrCreate() > > address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)]) > print("create dataframe finished") > address_tok_result_df.createOrReplaceTempView("scenery_address_test1") > print(spark.read.table('scenery_address_test1').dtypes) > spark.sql("select * from scenery_address_test1").show() > ``` > > In spark2.3.3 I can easily gey the following result: > ``` > create dataframe finished > [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')] > +-+-++--- > |a|b|c| > +-+-++--- > |1|难|80| > |2|v|81| > +-+-+—+ > ``` > > But in 2.4.5. i can only get,but without result showing out: > create dataframe finished > [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32358) temp view not working after upgrading from 2.3.3 to 2.4.5
[ https://issues.apache.org/jira/browse/SPARK-32358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] philipse resolved SPARK-32358. -- Resolution: Fixed > temp view not working after upgrading from 2.3.3 to 2.4.5 > - > > Key: SPARK-32358 > URL: https://issues.apache.org/jira/browse/SPARK-32358 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.5 >Reporter: philipse >Priority: Major > > After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . > Please correct me if i miss sth. Thanks! > Steps to reproduce: > ``` > from pyspark.sql import SparkSession > from pyspark.sql import Row > spark=SparkSession\ > .builder \ > .appName('scenary_address_1') \ > .enableHiveSupport() \ > .getOrCreate() > > address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)]) > print("create dataframe finished") > address_tok_result_df.createOrReplaceTempView("scenery_address_test1") > print(spark.read.table('scenery_address_test1').dtypes) > spark.sql("select * from scenery_address_test1").show() > ``` > > In spark2.3.3 I can easily gey the following result: > ``` > create dataframe finished > [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')] > +-+-++--- > |a|b|c| > +-+-++--- > |1|难|80| > |2|v|81| > +-+-+—+ > ``` > > But in 2.4.5. i can only get,but without result showing out: > create dataframe finished > [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-32358) temp view not working after upgrading from 2.3.3 to 2.4.5
[ https://issues.apache.org/jira/browse/SPARK-32358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] philipse reopened SPARK-32358: -- > temp view not working after upgrading from 2.3.3 to 2.4.5 > - > > Key: SPARK-32358 > URL: https://issues.apache.org/jira/browse/SPARK-32358 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.5 >Reporter: philipse >Priority: Major > > After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . > Please correct me if i miss sth. Thanks! > Steps to reproduce: > ``` > from pyspark.sql import SparkSession > from pyspark.sql import Row > spark=SparkSession\ > .builder \ > .appName('scenary_address_1') \ > .enableHiveSupport() \ > .getOrCreate() > > address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)]) > print("create dataframe finished") > address_tok_result_df.createOrReplaceTempView("scenery_address_test1") > print(spark.read.table('scenery_address_test1').dtypes) > spark.sql("select * from scenery_address_test1").show() > ``` > > In spark2.3.3 I can easily gey the following result: > ``` > create dataframe finished > [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')] > +-+-++--- > |a|b|c| > +-+-++--- > |1|难|80| > |2|v|81| > +-+-+—+ > ``` > > But in 2.4.5. i can only get,but without result showing out: > create dataframe finished > [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents
[ https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32367: Assignee: Apache Spark > Correct the spelling of parameter in KubernetesTestComponents > - > > Key: SPARK-32367 > URL: https://issues.apache.org/jira/browse/SPARK-32367 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: merrily01 >Assignee: Apache Spark >Priority: Trivial > Fix For: 3.1.0 > > > Correct the spelling of parameter 'spark.executor.instances' in > KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents
[ https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32367: Assignee: (was: Apache Spark) > Correct the spelling of parameter in KubernetesTestComponents > - > > Key: SPARK-32367 > URL: https://issues.apache.org/jira/browse/SPARK-32367 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: merrily01 >Priority: Trivial > Fix For: 3.1.0 > > > Correct the spelling of parameter 'spark.executor.instances' in > KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents
[ https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] merrily01 reopened SPARK-32367: --- > Correct the spelling of parameter in KubernetesTestComponents > - > > Key: SPARK-32367 > URL: https://issues.apache.org/jira/browse/SPARK-32367 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: merrily01 >Priority: Trivial > Fix For: 3.1.0 > > > Correct the spelling of parameter 'spark.executor.instances' in > KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents
[ https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161038#comment-17161038 ] Apache Spark commented on SPARK-32367: -- User 'merrily01' has created a pull request for this issue: https://github.com/apache/spark/pull/29163 > Correct the spelling of parameter in KubernetesTestComponents > - > > Key: SPARK-32367 > URL: https://issues.apache.org/jira/browse/SPARK-32367 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: merrily01 >Priority: Trivial > Fix For: 3.1.0 > > > Correct the spelling of parameter 'spark.executor.instances' in > KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents
[ https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161037#comment-17161037 ] Apache Spark commented on SPARK-32367: -- User 'merrily01' has created a pull request for this issue: https://github.com/apache/spark/pull/29163 > Correct the spelling of parameter in KubernetesTestComponents > - > > Key: SPARK-32367 > URL: https://issues.apache.org/jira/browse/SPARK-32367 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: merrily01 >Priority: Trivial > Fix For: 3.1.0 > > > Correct the spelling of parameter 'spark.executor.instances' in > KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents
[ https://issues.apache.org/jira/browse/SPARK-32367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] merrily01 resolved SPARK-32367. --- Resolution: Works for Me > Correct the spelling of parameter in KubernetesTestComponents > - > > Key: SPARK-32367 > URL: https://issues.apache.org/jira/browse/SPARK-32367 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: merrily01 >Priority: Trivial > Fix For: 3.1.0 > > > Correct the spelling of parameter 'spark.executor.instances' in > KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32367) Correct the spelling of parameter in KubernetesTestComponents
merrily01 created SPARK-32367: - Summary: Correct the spelling of parameter in KubernetesTestComponents Key: SPARK-32367 URL: https://issues.apache.org/jira/browse/SPARK-32367 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.0.0 Reporter: merrily01 Fix For: 3.1.0 Correct the spelling of parameter 'spark.executor.instances' in KubernetesTestComponents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32366) Fix doc link of datetime pattern in 3.0 migration guide
[ https://issues.apache.org/jira/browse/SPARK-32366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161029#comment-17161029 ] Apache Spark commented on SPARK-32366: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/29162 > Fix doc link of datetime pattern in 3.0 migration guide > --- > > Key: SPARK-32366 > URL: https://issues.apache.org/jira/browse/SPARK-32366 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.0.1, 3.1.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > In http://spark.apache.org/docs/latest/sql-migration-guide.html#query-engine, > there is a invalid reference for datetime reference > "sql-ref-datetime-pattern.md". We should fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32366) Fix doc link of datetime pattern in 3.0 migration guide
[ https://issues.apache.org/jira/browse/SPARK-32366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32366: Assignee: Apache Spark (was: Gengliang Wang) > Fix doc link of datetime pattern in 3.0 migration guide > --- > > Key: SPARK-32366 > URL: https://issues.apache.org/jira/browse/SPARK-32366 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.0.1, 3.1.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > In http://spark.apache.org/docs/latest/sql-migration-guide.html#query-engine, > there is a invalid reference for datetime reference > "sql-ref-datetime-pattern.md". We should fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32366) Fix doc link of datetime pattern in 3.0 migration guide
[ https://issues.apache.org/jira/browse/SPARK-32366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32366: Assignee: Gengliang Wang (was: Apache Spark) > Fix doc link of datetime pattern in 3.0 migration guide > --- > > Key: SPARK-32366 > URL: https://issues.apache.org/jira/browse/SPARK-32366 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.0.1, 3.1.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > In http://spark.apache.org/docs/latest/sql-migration-guide.html#query-engine, > there is a invalid reference for datetime reference > "sql-ref-datetime-pattern.md". We should fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32366) Fix doc link of datetime pattern in 3.0 migration guide
[ https://issues.apache.org/jira/browse/SPARK-32366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161028#comment-17161028 ] Apache Spark commented on SPARK-32366: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/29162 > Fix doc link of datetime pattern in 3.0 migration guide > --- > > Key: SPARK-32366 > URL: https://issues.apache.org/jira/browse/SPARK-32366 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.0.1, 3.1.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > In http://spark.apache.org/docs/latest/sql-migration-guide.html#query-engine, > there is a invalid reference for datetime reference > "sql-ref-datetime-pattern.md". We should fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32366) Fix doc link of datetime pattern in 3.0 migration guide
Gengliang Wang created SPARK-32366: -- Summary: Fix doc link of datetime pattern in 3.0 migration guide Key: SPARK-32366 URL: https://issues.apache.org/jira/browse/SPARK-32366 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 3.0.1, 3.1.0 Reporter: Gengliang Wang Assignee: Gengliang Wang In http://spark.apache.org/docs/latest/sql-migration-guide.html#query-engine, there is a invalid reference for datetime reference "sql-ref-datetime-pattern.md". We should fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32364) `path` argument of DataFrame.load/save should override the existing options
[ https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32364: -- Description: Although we introduced CaseInsensitiveMap and CaseInsensitiveStringMap(in DSv2), when a user have multiple options like `path`, `paTH`, and `PATH` for the same key `path`, `option()/options()` are non-deterministic because `extraOptions` is `HashMap`. This issue aims to make load/save respect its direct path argument always and ignore the existing options. It's because that load/save function is independent from users' typos like `paTH` and is designed to be invoked at the last operation. So, load/save should work consistently and correctly always. Please note that this doesn't aim to enforce case-insensitivity to `option()/options()` or `extraOptions` variable because that might be considered as a behavior change. {code} spark.read .option("paTh", "1") .option("PATH", "2") .option("Path", "3") .option("patH", "4") .load("5") ... org.apache.spark.sql.AnalysisException: Path does not exist: file:/.../1; {code} was: {code} spark.read .option("paTh", "1") .option("PATH", "2") .option("Path", "3") .option("patH", "4") .load("5") ... org.apache.spark.sql.AnalysisException: Path does not exist: file:/.../1; {code} {code} spark.read .option("PATH", "2") .option("Path", "3") .option("patH", "4") .load("5") ... org.apache.spark.sql.AnalysisException: Path does not exist: file:/.../4; {code} > `path` argument of DataFrame.load/save should override the existing options > --- > > Key: SPARK-32364 > URL: https://issues.apache.org/jira/browse/SPARK-32364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Although we introduced CaseInsensitiveMap and CaseInsensitiveStringMap(in > DSv2), when a user have multiple options like `path`, `paTH`, and `PATH` for > the same key `path`, `option()/options()` are non-deterministic because > `extraOptions` is `HashMap`. This issue aims to make load/save respect its > direct path argument always and ignore the existing options. It's because > that load/save function is independent from users' typos like `paTH` and is > designed to be invoked at the last operation. So, load/save should work > consistently and correctly always. > Please note that this doesn't aim to enforce case-insensitivity to > `option()/options()` or `extraOptions` variable because that might be > considered as a behavior change. > {code} > spark.read > .option("paTh", "1") > .option("PATH", "2") > .option("Path", "3") > .option("patH", "4") > .load("5") > ... > org.apache.spark.sql.AnalysisException: > Path does not exist: file:/.../1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32364) `path` argument of DataFrame.load/save should override the existing options
[ https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32364: -- Description: {code} spark.read .option("paTh", "1") .option("PATH", "2") .option("Path", "3") .option("patH", "4") .load("5") ... org.apache.spark.sql.AnalysisException: Path does not exist: file:/.../1; {code} {code} spark.read .option("PATH", "2") .option("Path", "3") .option("patH", "4") .load("5") ... org.apache.spark.sql.AnalysisException: Path does not exist: file:/.../4; {code} was: {code} spark.read .option("paTh", "1") .option("PATH", "2") .option("Path", "3") .option("patH", "4") .parquet("5") ... org.apache.spark.sql.AnalysisException: Path does not exist: file:/.../1; {code} {code} spark.read .option("PATH", "2") .option("Path", "3") .option("patH", "4") .parquet("5") ... org.apache.spark.sql.AnalysisException: Path does not exist: file:/.../4; {code} > `path` argument of DataFrame.load/save should override the existing options > --- > > Key: SPARK-32364 > URL: https://issues.apache.org/jira/browse/SPARK-32364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > spark.read > .option("paTh", "1") > .option("PATH", "2") > .option("Path", "3") > .option("patH", "4") > .load("5") > ... > org.apache.spark.sql.AnalysisException: > Path does not exist: file:/.../1; > {code} > {code} > spark.read > .option("PATH", "2") > .option("Path", "3") > .option("patH", "4") > .load("5") > ... > org.apache.spark.sql.AnalysisException: > Path does not exist: file:/.../4; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32344) Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates
[ https://issues.apache.org/jira/browse/SPARK-32344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32344: -- Fix Version/s: 2.4.7 > Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates > > > Key: SPARK-32344 > URL: https://issues.apache.org/jira/browse/SPARK-32344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Minor > Fix For: 2.4.7, 3.0.1, 3.1.0 > > > {code} > scala> sql("SELECT FIRST(DISTINCT v) FROM VALUES 1, 2, 3 t(v)").show() > ... > Caused by: java.lang.UnsupportedOperationException: Cannot evaluate > expression: false#37 > at > org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:258) > at > org.apache.spark.sql.catalyst.expressions.AttributeReference.eval(namedExpressions.scala:226) > at > org.apache.spark.sql.catalyst.expressions.aggregate.First.ignoreNulls(First.scala:68) > at > org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions$lzycompute(First.scala:82) > at > org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions(First.scala:81) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$15.apply(HashAggregateExec.scala:268) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1
[ https://issues.apache.org/jira/browse/SPARK-32365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32365: Assignee: (was: Apache Spark) > Fix java.lang.IndexOutOfBoundsException: No group -1 > > > Key: SPARK-32365 > URL: https://issues.apache.org/jira/browse/SPARK-32365 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.0 >Reporter: jiaan.geng >Priority: Major > > The current implement of regexp_extract will throws a unprocessed exception > show below: > SELECT regexp_extract('1a 2b 14m', 'd+' -1) > {code:java} > java.lang.IndexOutOfBoundsException: No group -1 > java.util.regex.Matcher.group(Matcher.java:538) > org.apache.spark.sql.catalyst.expressions.RegExpExtract.nullSafeEval(regexpExpressions.scala:455) > org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:704) > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52) > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1
[ https://issues.apache.org/jira/browse/SPARK-32365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160942#comment-17160942 ] Apache Spark commented on SPARK-32365: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/29161 > Fix java.lang.IndexOutOfBoundsException: No group -1 > > > Key: SPARK-32365 > URL: https://issues.apache.org/jira/browse/SPARK-32365 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.0 >Reporter: jiaan.geng >Priority: Major > > The current implement of regexp_extract will throws a unprocessed exception > show below: > SELECT regexp_extract('1a 2b 14m', 'd+' -1) > {code:java} > java.lang.IndexOutOfBoundsException: No group -1 > java.util.regex.Matcher.group(Matcher.java:538) > org.apache.spark.sql.catalyst.expressions.RegExpExtract.nullSafeEval(regexpExpressions.scala:455) > org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:704) > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52) > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1
[ https://issues.apache.org/jira/browse/SPARK-32365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32365: Assignee: Apache Spark > Fix java.lang.IndexOutOfBoundsException: No group -1 > > > Key: SPARK-32365 > URL: https://issues.apache.org/jira/browse/SPARK-32365 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > The current implement of regexp_extract will throws a unprocessed exception > show below: > SELECT regexp_extract('1a 2b 14m', 'd+' -1) > {code:java} > java.lang.IndexOutOfBoundsException: No group -1 > java.util.regex.Matcher.group(Matcher.java:538) > org.apache.spark.sql.catalyst.expressions.RegExpExtract.nullSafeEval(regexpExpressions.scala:455) > org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:704) > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52) > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1
[ https://issues.apache.org/jira/browse/SPARK-32365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-32365: --- Description: The current implement of regexp_extract will throws a unprocessed exception show below: SELECT regexp_extract('1a 2b 14m', 'd+' -1) {code:java} java.lang.IndexOutOfBoundsException: No group -1 java.util.regex.Matcher.group(Matcher.java:538) org.apache.spark.sql.catalyst.expressions.RegExpExtract.nullSafeEval(regexpExpressions.scala:455) org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:704) org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52) org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45) {code} was: The current implement of regexp_extract will throws a unprocessed exception show below: SELECT regexp_extract('1a 2b 14m', 'd+' -1) java.util.regex.Matcher.group(Matcher.java:538) > Fix java.lang.IndexOutOfBoundsException: No group -1 > > > Key: SPARK-32365 > URL: https://issues.apache.org/jira/browse/SPARK-32365 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.0 >Reporter: jiaan.geng >Priority: Major > > The current implement of regexp_extract will throws a unprocessed exception > show below: > SELECT regexp_extract('1a 2b 14m', 'd+' -1) > {code:java} > java.lang.IndexOutOfBoundsException: No group -1 > java.util.regex.Matcher.group(Matcher.java:538) > org.apache.spark.sql.catalyst.expressions.RegExpExtract.nullSafeEval(regexpExpressions.scala:455) > org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:704) > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:52) > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:45) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1
[ https://issues.apache.org/jira/browse/SPARK-32365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-32365: --- Description: The current implement of regexp_extract will throws a unprocessed exception show below: SELECT regexp_extract('1a 2b 14m', 'd+' -1) java.util.regex.Matcher.group(Matcher.java:538) > Fix java.lang.IndexOutOfBoundsException: No group -1 > > > Key: SPARK-32365 > URL: https://issues.apache.org/jira/browse/SPARK-32365 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.0 >Reporter: jiaan.geng >Priority: Major > > The current implement of regexp_extract will throws a unprocessed exception > show below: > SELECT regexp_extract('1a 2b 14m', 'd+' -1) > java.util.regex.Matcher.group(Matcher.java:538) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32365) Fix java.lang.IndexOutOfBoundsException: No group -1
jiaan.geng created SPARK-32365: -- Summary: Fix java.lang.IndexOutOfBoundsException: No group -1 Key: SPARK-32365 URL: https://issues.apache.org/jira/browse/SPARK-32365 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1, 3.1.0 Reporter: jiaan.geng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32344) Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates
[ https://issues.apache.org/jira/browse/SPARK-32344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32344: -- Affects Version/s: 2.2.3 > Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates > > > Key: SPARK-32344 > URL: https://issues.apache.org/jira/browse/SPARK-32344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Minor > Fix For: 3.0.1, 3.1.0 > > > {code} > scala> sql("SELECT FIRST(DISTINCT v) FROM VALUES 1, 2, 3 t(v)").show() > ... > Caused by: java.lang.UnsupportedOperationException: Cannot evaluate > expression: false#37 > at > org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:258) > at > org.apache.spark.sql.catalyst.expressions.AttributeReference.eval(namedExpressions.scala:226) > at > org.apache.spark.sql.catalyst.expressions.aggregate.First.ignoreNulls(First.scala:68) > at > org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions$lzycompute(First.scala:82) > at > org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions(First.scala:81) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$15.apply(HashAggregateExec.scala:268) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32344) Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates
[ https://issues.apache.org/jira/browse/SPARK-32344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32344: -- Affects Version/s: 2.3.4 > Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates > > > Key: SPARK-32344 > URL: https://issues.apache.org/jira/browse/SPARK-32344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.6, 3.0.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Minor > Fix For: 3.0.1, 3.1.0 > > > {code} > scala> sql("SELECT FIRST(DISTINCT v) FROM VALUES 1, 2, 3 t(v)").show() > ... > Caused by: java.lang.UnsupportedOperationException: Cannot evaluate > expression: false#37 > at > org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:258) > at > org.apache.spark.sql.catalyst.expressions.AttributeReference.eval(namedExpressions.scala:226) > at > org.apache.spark.sql.catalyst.expressions.aggregate.First.ignoreNulls(First.scala:68) > at > org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions$lzycompute(First.scala:82) > at > org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions(First.scala:81) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$15.apply(HashAggregateExec.scala:268) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32364) `path` argument of DataFrame.load/save should override the existing options
[ https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32364: -- Description: {code} spark.read .option("paTh", "1") .option("PATH", "2") .option("Path", "3") .option("patH", "4") .parquet("5") ... org.apache.spark.sql.AnalysisException: Path does not exist: file:/.../1; {code} {code} spark.read .option("PATH", "2") .option("Path", "3") .option("patH", "4") .parquet("5") ... org.apache.spark.sql.AnalysisException: Path does not exist: file:/.../4; {code} was: {code} spark.read .option("paTh", "1") .option("PATH", "2") .option("Path", "3") .option("patH", "4") .parquet("5") {code} {code} org.apache.spark.sql.AnalysisException: Path does not exist: file:/Users/dongjoon/APACHE/spark-release/spark-3.0.0-bin-hadoop3.2/1; {code} > `path` argument of DataFrame.load/save should override the existing options > --- > > Key: SPARK-32364 > URL: https://issues.apache.org/jira/browse/SPARK-32364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > spark.read > .option("paTh", "1") > .option("PATH", "2") > .option("Path", "3") > .option("patH", "4") > .parquet("5") > ... > org.apache.spark.sql.AnalysisException: > Path does not exist: file:/.../1; > {code} > {code} > spark.read > .option("PATH", "2") > .option("Path", "3") > .option("patH", "4") > .parquet("5") > ... > org.apache.spark.sql.AnalysisException: > Path does not exist: file:/.../4; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32364) `path` argument of DataFrame.load/save should override the existing options
[ https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32364: Assignee: Apache Spark > `path` argument of DataFrame.load/save should override the existing options > --- > > Key: SPARK-32364 > URL: https://issues.apache.org/jira/browse/SPARK-32364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > {code} > spark.read > .option("paTh", "1") > .option("PATH", "2") > .option("Path", "3") > .option("patH", "4") > .parquet("5") > {code} > {code} > org.apache.spark.sql.AnalysisException: Path does not exist: > file:/Users/dongjoon/APACHE/spark-release/spark-3.0.0-bin-hadoop3.2/1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32364) `path` argument of DataFrame.load/save should override the existing options
[ https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160924#comment-17160924 ] Apache Spark commented on SPARK-32364: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/29160 > `path` argument of DataFrame.load/save should override the existing options > --- > > Key: SPARK-32364 > URL: https://issues.apache.org/jira/browse/SPARK-32364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > spark.read > .option("paTh", "1") > .option("PATH", "2") > .option("Path", "3") > .option("patH", "4") > .parquet("5") > {code} > {code} > org.apache.spark.sql.AnalysisException: Path does not exist: > file:/Users/dongjoon/APACHE/spark-release/spark-3.0.0-bin-hadoop3.2/1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32364) `path` argument of DataFrame.load/save should override the existing options
[ https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32364: Assignee: (was: Apache Spark) > `path` argument of DataFrame.load/save should override the existing options > --- > > Key: SPARK-32364 > URL: https://issues.apache.org/jira/browse/SPARK-32364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > spark.read > .option("paTh", "1") > .option("PATH", "2") > .option("Path", "3") > .option("patH", "4") > .parquet("5") > {code} > {code} > org.apache.spark.sql.AnalysisException: Path does not exist: > file:/Users/dongjoon/APACHE/spark-release/spark-3.0.0-bin-hadoop3.2/1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org