[jira] [Commented] (SPARK-41270) Add Catalog tableExists and namespaceExists in Connect proto
[ https://issues.apache.org/jira/browse/SPARK-41270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638805#comment-17638805 ] Apache Spark commented on SPARK-41270: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38807 > Add Catalog tableExists and namespaceExists in Connect proto > > > Key: SPARK-41270 > URL: https://issues.apache.org/jira/browse/SPARK-41270 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41270) Add Catalog tableExists and namespaceExists in Connect proto
[ https://issues.apache.org/jira/browse/SPARK-41270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41270: Assignee: Apache Spark > Add Catalog tableExists and namespaceExists in Connect proto > > > Key: SPARK-41270 > URL: https://issues.apache.org/jira/browse/SPARK-41270 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41270) Add Catalog tableExists and namespaceExists in Connect proto
[ https://issues.apache.org/jira/browse/SPARK-41270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638804#comment-17638804 ] Apache Spark commented on SPARK-41270: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38807 > Add Catalog tableExists and namespaceExists in Connect proto > > > Key: SPARK-41270 > URL: https://issues.apache.org/jira/browse/SPARK-41270 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41270) Add Catalog tableExists and namespaceExists in Connect proto
[ https://issues.apache.org/jira/browse/SPARK-41270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41270: Assignee: (was: Apache Spark) > Add Catalog tableExists and namespaceExists in Connect proto > > > Key: SPARK-41270 > URL: https://issues.apache.org/jira/browse/SPARK-41270 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41270) Add Catalog tableExists and namespaceExists in Connect proto
Rui Wang created SPARK-41270: Summary: Add Catalog tableExists and namespaceExists in Connect proto Key: SPARK-41270 URL: https://issues.apache.org/jira/browse/SPARK-41270 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41269) Move image matrix into version's workflow
Yikun Jiang created SPARK-41269: --- Summary: Move image matrix into version's workflow Key: SPARK-41269 URL: https://issues.apache.org/jira/browse/SPARK-41269 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.4.0 Reporter: Yikun Jiang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41255) RemoteSparkSession should be called SparkSession
[ https://issues.apache.org/jira/browse/SPARK-41255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-41255. --- Fix Version/s: 3.4.0 Assignee: Martin Grund Resolution: Fixed > RemoteSparkSession should be called SparkSession > > > Key: SPARK-41255 > URL: https://issues.apache.org/jira/browse/SPARK-41255 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41268) Refactor "Column" for API Compatibility
[ https://issues.apache.org/jira/browse/SPARK-41268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41268: Assignee: Apache Spark > Refactor "Column" for API Compatibility > > > Key: SPARK-41268 > URL: https://issues.apache.org/jira/browse/SPARK-41268 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41268) Refactor "Column" for API Compatibility
[ https://issues.apache.org/jira/browse/SPARK-41268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41268: Assignee: (was: Apache Spark) > Refactor "Column" for API Compatibility > > > Key: SPARK-41268 > URL: https://issues.apache.org/jira/browse/SPARK-41268 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41268) Refactor "Column" for API Compatibility
[ https://issues.apache.org/jira/browse/SPARK-41268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638776#comment-17638776 ] Apache Spark commented on SPARK-41268: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38806 > Refactor "Column" for API Compatibility > > > Key: SPARK-41268 > URL: https://issues.apache.org/jira/browse/SPARK-41268 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41268) Refactor "Column" for API Compatibility
Rui Wang created SPARK-41268: Summary: Refactor "Column" for API Compatibility Key: SPARK-41268 URL: https://issues.apache.org/jira/browse/SPARK-41268 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41267) Add unpivot / melt to SparkR
[ https://issues.apache.org/jira/browse/SPARK-41267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638745#comment-17638745 ] Apache Spark commented on SPARK-41267: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/38804 > Add unpivot / melt to SparkR > > > Key: SPARK-41267 > URL: https://issues.apache.org/jira/browse/SPARK-41267 > Project: Spark > Issue Type: Improvement > Components: R, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Unpivot / melt operations have been implemented for Scala {{Dataset}} and > core Python {{{}DataFrame{}}}, but are missing from SparkR. We should add > these to achieve feature parity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41267) Add unpivot / melt to SparkR
[ https://issues.apache.org/jira/browse/SPARK-41267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41267: Assignee: (was: Apache Spark) > Add unpivot / melt to SparkR > > > Key: SPARK-41267 > URL: https://issues.apache.org/jira/browse/SPARK-41267 > Project: Spark > Issue Type: Improvement > Components: R, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Unpivot / melt operations have been implemented for Scala {{Dataset}} and > core Python {{{}DataFrame{}}}, but are missing from SparkR. We should add > these to achieve feature parity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41267) Add unpivot / melt to SparkR
[ https://issues.apache.org/jira/browse/SPARK-41267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41267: Assignee: Apache Spark > Add unpivot / melt to SparkR > > > Key: SPARK-41267 > URL: https://issues.apache.org/jira/browse/SPARK-41267 > Project: Spark > Issue Type: Improvement > Components: R, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Assignee: Apache Spark >Priority: Major > > Unpivot / melt operations have been implemented for Scala {{Dataset}} and > core Python {{{}DataFrame{}}}, but are missing from SparkR. We should add > these to achieve feature parity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41267) Add unpivot / melt to SparkR
[ https://issues.apache.org/jira/browse/SPARK-41267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638743#comment-17638743 ] Apache Spark commented on SPARK-41267: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/38804 > Add unpivot / melt to SparkR > > > Key: SPARK-41267 > URL: https://issues.apache.org/jira/browse/SPARK-41267 > Project: Spark > Issue Type: Improvement > Components: R, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Unpivot / melt operations have been implemented for Scala {{Dataset}} and > core Python {{{}DataFrame{}}}, but are missing from SparkR. We should add > these to achieve feature parity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41267) Add unpivot / melt to SparkR
Maciej Szymkiewicz created SPARK-41267: -- Summary: Add unpivot / melt to SparkR Key: SPARK-41267 URL: https://issues.apache.org/jira/browse/SPARK-41267 Project: Spark Issue Type: Improvement Components: R, SQL Affects Versions: 3.4.0 Reporter: Maciej Szymkiewicz Unpivot / melt operations have been implemented for Scala {{Dataset}} and core Python {{{}DataFrame{}}}, but are missing from SparkR. We should add these to achieve feature parity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638626#comment-17638626 ] huldar chen edited comment on SPARK-41236 at 11/25/22 5:53 PM: --- If I fix it according to my idea, it will cause 2 closed jira issues to reappear. SPARK-31663 and SPARK-31519. This may involve knowledge of SQL standards, and I am not good at it here.I don't think I can fix this bug.:( h4. [jingxiong zhong|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=zhongjingxiong] was (Author: huldar): If I fix it according to my idea, it will cause 2 closed jira issues to reappear. SPARK-31663 and SPARK-31663. This may involve knowledge of SQL standards, and I am not good at it here.I don't think I can fix this bug.:( h4. [jingxiong zhong|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=zhongjingxiong] > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.tr
[jira] [Commented] (SPARK-41114) Support local data for LocalRelation
[ https://issues.apache.org/jira/browse/SPARK-41114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638734#comment-17638734 ] Apache Spark commented on SPARK-41114: -- User 'grundprinzip' has created a pull request for this issue: https://github.com/apache/spark/pull/38803 > Support local data for LocalRelation > > > Key: SPARK-41114 > URL: https://issues.apache.org/jira/browse/SPARK-41114 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Deng Ziming >Assignee: Deng Ziming >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40539) PySpark readwriter API parity for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-40539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638666#comment-17638666 ] Apache Spark commented on SPARK-40539: -- User 'grundprinzip' has created a pull request for this issue: https://github.com/apache/spark/pull/38801 > PySpark readwriter API parity for Spark Connect > --- > > Key: SPARK-40539 > URL: https://issues.apache.org/jira/browse/SPARK-40539 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > > Spark Connect / PySpark ReadWriter parity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41266) Spark does not parse timestamp strings when using the IN operator
Laurens Versluis created SPARK-41266: Summary: Spark does not parse timestamp strings when using the IN operator Key: SPARK-41266 URL: https://issues.apache.org/jira/browse/SPARK-41266 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Environment: Windows 10, Spark 3.2.1 with Java 11 Reporter: Laurens Versluis Likely affects more versions, tested only with 3.2.1. Summary: Spark will convert a timestamp string to a timestamp when using the equal operator (=), yet won't do this when using the IN operator. Details: While debugging an issue why we got no results on a query, we found out that when using the equal symbol `=` in the WHERE clause combined with a TimeStampType column that Spark will convert the string to a timestamp and filter. However, when using the IN operator (our query), it will not do so, and perform a cast to string. We expected the behavior to be similar, or at least that Spark realizes the IN clause operates on a TimeStampType column and thus attempts to convert to timestamp first before falling back to string comparison. *Minimal reproducible example:* Suppose we have a one-line dataset with the follow contents and schema: {noformat} ++ |starttime | ++ |2019-08-11 19:33:05 | ++ root |-- starttime: timestamp (nullable = true){noformat} Then if we fire the following queries, we will not get results for the IN-clause one using a timestamp string with timezone information: {code:java} // Works - Spark casts the argument to a string and the internal representation of the time seems to match it... singleCol.filter("starttime IN ('2019-08-11 19:33:05')").show(); // Works singleCol.filter("starttime = '2019-08-11 19:33:05'").show(); // Works singleCol.filter("starttime = '2019-08-11T19:33:05Z'").show(); // Doesn't work singleCol.filter("starttime IN ('2019-08-11T19:33:05Z')").show(); //Works singleCol.filter("starttime IN (to_timestamp('2019-08-11T19:33:05Z'))").show(); {code} We can see from the output that a cast to string is taking place: {noformat} [...] isnotnull(starttime#59),(cast(starttime#59 as string) = 2019-08-11 19:33:05){noformat} Since the = operator does work, it would be consistent if operators such as the IN operator would have similar, consistent behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33001) Why am I receiving this warning?
[ https://issues.apache.org/jira/browse/SPARK-33001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638647#comment-17638647 ] geekyouth commented on SPARK-33001: --- i have set "spark-3.0.0-bin-hadoop3.2\conf\spark-defaults.conf" as follow: {code:java} spark.executor.processTreeMetrics.enabled false {code} then restart terminal,run spark-shell, but the warn rise again > Why am I receiving this warning? > > > Key: SPARK-33001 > URL: https://issues.apache.org/jira/browse/SPARK-33001 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: George Fotopoulos >Priority: Major > > I am running Apache Spark Core using Scala 2.12.12 on IntelliJ IDEA 2020.2 > with Docker 2.3.0.5 > I am running Windows 10 build 2004 > Can somebody explain me why am I receiving this warning and what can I do > about it? > I tried googling this warning but, all I found was people asking about it and > no answers. > [screenshot|https://user-images.githubusercontent.com/1548352/94319642-c8102c80-ff93-11ea-9fea-f58de8da2268.png] > {code:scala} > WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a > result reporting of ProcessTree metrics is stopped > {code} > Thanks in advance! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638626#comment-17638626 ] huldar chen edited comment on SPARK-41236 at 11/25/22 10:49 AM: If I fix it according to my idea, it will cause 2 closed jira issues to reappear. SPARK-31663 and SPARK-31663. This may involve knowledge of SQL standards, and I am not good at it here.I don't think I can fix this bug.:( h4. [jingxiong zhong|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=zhongjingxiong] was (Author: huldar): If I fix it according to my idea, it will cause 2 closed jira issues to reappear. SPARK-31663 and SPARK-31663. This may involve knowledge of SQL standards, and I am not good at it here.I don't think I can fix this bug.:( > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst
[jira] [Commented] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638626#comment-17638626 ] huldar chen commented on SPARK-41236: - If I fix it according to my idea, it will cause 2 closed jira issues to reappear. SPARK-31663 and SPARK-31663. This may involve knowledge of SQL standards, and I am not good at it here.I don't think I can fix this bug.:( > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) > a
[jira] [Updated] (SPARK-41262) Enable canChangeCachedPlanOutputPartitioning by default
[ https://issues.apache.org/jira/browse/SPARK-41262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-41262: -- Description: Remove the `internal` tag of `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning`, and tune it from false to true by default to make AQE work with cached plan. (was: Tune spark.sql.optimizer.canChangeCachedPlanOutputPartitioning from false to true by default to make AQE work with cached plan.) > Enable canChangeCachedPlanOutputPartitioning by default > --- > > Key: SPARK-41262 > URL: https://issues.apache.org/jira/browse/SPARK-41262 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > Remove the `internal` tag of > `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning`, and tune it from > false to true by default to make AQE work with cached plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41240) Upgrade Protobuf from 3.19.4 to 3.19.5
[ https://issues.apache.org/jira/browse/SPARK-41240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-41240. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38774 [https://github.com/apache/spark/pull/38774] > Upgrade Protobuf from 3.19.4 to 3.19.5 > -- > > Key: SPARK-41240 > URL: https://issues.apache.org/jira/browse/SPARK-41240 > Project: Spark > Issue Type: Dependency upgrade > Components: Build, Connect >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Minor > Fix For: 3.4.0 > > > [CVE-2022-1941|https://nvd.nist.gov/vuln/detail/CVE-2022-1941] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41265) Check and upgrade buf.build/protocolbuffers/plugins/python to 3.19.5
[ https://issues.apache.org/jira/browse/SPARK-41265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-41265: -- Description: https://github.com/apache/spark/pull/38774 https://buf.build/protocolbuffers/plugins/python when there is new version >= 3.19.5, should upgrade then was: https://github.com/apache/spark/pull/38774 when there is new version >= 3.19.5, should upgrade then > Check and upgrade buf.build/protocolbuffers/plugins/python to 3.19.5 > > > Key: SPARK-41265 > URL: https://issues.apache.org/jira/browse/SPARK-41265 > Project: Spark > Issue Type: Sub-task > Components: Connect, Project Infra >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > https://github.com/apache/spark/pull/38774 > https://buf.build/protocolbuffers/plugins/python > when there is new version >= 3.19.5, should upgrade then -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41240) Upgrade Protobuf from 3.19.4 to 3.19.5
[ https://issues.apache.org/jira/browse/SPARK-41240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-41240: - Assignee: Bjørn Jørgensen > Upgrade Protobuf from 3.19.4 to 3.19.5 > -- > > Key: SPARK-41240 > URL: https://issues.apache.org/jira/browse/SPARK-41240 > Project: Spark > Issue Type: Dependency upgrade > Components: Build, Connect >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Minor > > [CVE-2022-1941|https://nvd.nist.gov/vuln/detail/CVE-2022-1941] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41265) Check and upgrade buf.build/protocolbuffers/plugins/python to 3.19.5
Ruifeng Zheng created SPARK-41265: - Summary: Check and upgrade buf.build/protocolbuffers/plugins/python to 3.19.5 Key: SPARK-41265 URL: https://issues.apache.org/jira/browse/SPARK-41265 Project: Spark Issue Type: Sub-task Components: Connect, Project Infra Affects Versions: 3.4.0 Reporter: Ruifeng Zheng https://github.com/apache/spark/pull/38774 when there is new version >= 3.19.5, should upgrade then -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41245) Upgrade postgresql from 42.5.0 to 42.5.1
[ https://issues.apache.org/jira/browse/SPARK-41245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-41245. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38791 [https://github.com/apache/spark/pull/38791] > Upgrade postgresql from 42.5.0 to 42.5.1 > > > Key: SPARK-41245 > URL: https://issues.apache.org/jira/browse/SPARK-41245 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Minor > Fix For: 3.4.0 > > > [CVE-2022-41946|https://nvd.nist.gov/vuln/detail/CVE-2022-41946] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41245) Upgrade postgresql from 42.5.0 to 42.5.1
[ https://issues.apache.org/jira/browse/SPARK-41245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-41245: - Assignee: Bjørn Jørgensen > Upgrade postgresql from 42.5.0 to 42.5.1 > > > Key: SPARK-41245 > URL: https://issues.apache.org/jira/browse/SPARK-41245 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Minor > > [CVE-2022-41946|https://nvd.nist.gov/vuln/detail/CVE-2022-41946] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41252) Upgrade arrow from 10.0.0 to 10.0.1
[ https://issues.apache.org/jira/browse/SPARK-41252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-41252: - Assignee: BingKun Pan > Upgrade arrow from 10.0.0 to 10.0.1 > --- > > Key: SPARK-41252 > URL: https://issues.apache.org/jira/browse/SPARK-41252 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41252) Upgrade arrow from 10.0.0 to 10.0.1
[ https://issues.apache.org/jira/browse/SPARK-41252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-41252. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38788 [https://github.com/apache/spark/pull/38788] > Upgrade arrow from 10.0.0 to 10.0.1 > --- > > Key: SPARK-41252 > URL: https://issues.apache.org/jira/browse/SPARK-41252 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41264) Make Literal support more datatypes
[ https://issues.apache.org/jira/browse/SPARK-41264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638567#comment-17638567 ] Apache Spark commented on SPARK-41264: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/38800 > Make Literal support more datatypes > --- > > Key: SPARK-41264 > URL: https://issues.apache.org/jira/browse/SPARK-41264 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41264) Make Literal support more datatypes
[ https://issues.apache.org/jira/browse/SPARK-41264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638565#comment-17638565 ] Apache Spark commented on SPARK-41264: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/38800 > Make Literal support more datatypes > --- > > Key: SPARK-41264 > URL: https://issues.apache.org/jira/browse/SPARK-41264 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41264) Make Literal support more datatypes
[ https://issues.apache.org/jira/browse/SPARK-41264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41264: Assignee: (was: Apache Spark) > Make Literal support more datatypes > --- > > Key: SPARK-41264 > URL: https://issues.apache.org/jira/browse/SPARK-41264 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41264) Make Literal support more datatypes
[ https://issues.apache.org/jira/browse/SPARK-41264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41264: Assignee: Apache Spark > Make Literal support more datatypes > --- > > Key: SPARK-41264 > URL: https://issues.apache.org/jira/browse/SPARK-41264 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236 ] huldar chen deleted comment on SPARK-41236: - was (Author: huldar): ok, let's make it my first pr:) > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAna
[jira] [Created] (SPARK-41264) Make Literal support more datatypes
Ruifeng Zheng created SPARK-41264: - Summary: Make Literal support more datatypes Key: SPARK-41264 URL: https://issues.apache.org/jira/browse/SPARK-41264 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41263) Upgrade buf to v1.9.0
[ https://issues.apache.org/jira/browse/SPARK-41263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638556#comment-17638556 ] Apache Spark commented on SPARK-41263: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/38797 > Upgrade buf to v1.9.0 > - > > Key: SPARK-41263 > URL: https://issues.apache.org/jira/browse/SPARK-41263 > Project: Spark > Issue Type: Sub-task > Components: Connect, Project Infra >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41263) Upgrade buf to v1.9.0
[ https://issues.apache.org/jira/browse/SPARK-41263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41263: Assignee: Apache Spark > Upgrade buf to v1.9.0 > - > > Key: SPARK-41263 > URL: https://issues.apache.org/jira/browse/SPARK-41263 > Project: Spark > Issue Type: Sub-task > Components: Connect, Project Infra >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41263) Upgrade buf to v1.9.0
[ https://issues.apache.org/jira/browse/SPARK-41263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41263: Assignee: (was: Apache Spark) > Upgrade buf to v1.9.0 > - > > Key: SPARK-41263 > URL: https://issues.apache.org/jira/browse/SPARK-41263 > Project: Spark > Issue Type: Sub-task > Components: Connect, Project Infra >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41263) Upgrade buf to v1.9.0
Ruifeng Zheng created SPARK-41263: - Summary: Upgrade buf to v1.9.0 Key: SPARK-41263 URL: https://issues.apache.org/jira/browse/SPARK-41263 Project: Spark Issue Type: Sub-task Components: Connect, Project Infra Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638551#comment-17638551 ] huldar chen commented on SPARK-41236: - ok, let's make it my first pr:) > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.s