[jira] [Assigned] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page
[ https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42829: Assignee: Apache Spark > Added Identifier to the cached RDD operator on the Stages page > --- > > Key: SPARK-42829 > URL: https://issues.apache.org/jira/browse/SPARK-42829 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Assignee: Apache Spark >Priority: Major > Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png > > > On the stages page in the Web UI, there is no distinction for which cached > RDD is being executed in a particular stage. This Jira aims to add an repeat > identifier to distinguish which cached RDD is being executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page
[ https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703039#comment-17703039 ] Apache Spark commented on SPARK-42829: -- User 'yliou' has created a pull request for this issue: https://github.com/apache/spark/pull/40502 > Added Identifier to the cached RDD operator on the Stages page > --- > > Key: SPARK-42829 > URL: https://issues.apache.org/jira/browse/SPARK-42829 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Priority: Major > Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png > > > On the stages page in the Web UI, there is no distinction for which cached > RDD is being executed in a particular stage. This Jira aims to add an repeat > identifier to distinguish which cached RDD is being executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page
[ https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42829: Assignee: (was: Apache Spark) > Added Identifier to the cached RDD operator on the Stages page > --- > > Key: SPARK-42829 > URL: https://issues.apache.org/jira/browse/SPARK-42829 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Priority: Major > Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png > > > On the stages page in the Web UI, there is no distinction for which cached > RDD is being executed in a particular stage. This Jira aims to add an repeat > identifier to distinguish which cached RDD is being executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page
[ https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703038#comment-17703038 ] Yian Liou commented on SPARK-42829: --- Opened PR at [https://github.com/apache/spark/pull/40502] and included screenshot there. [~gurwls223] > Added Identifier to the cached RDD operator on the Stages page > --- > > Key: SPARK-42829 > URL: https://issues.apache.org/jira/browse/SPARK-42829 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Priority: Major > Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png > > > On the stages page in the Web UI, there is no distinction for which cached > RDD is being executed in a particular stage. This Jira aims to add an repeat > identifier to distinguish which cached RDD is being executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-42864. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40500 [https://github.com/apache/spark/pull/40500] > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703026#comment-17703026 ] Apache Spark commented on SPARK-42864: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40501 > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703017#comment-17703017 ] Apache Spark commented on SPARK-42864: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40500 > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703018#comment-17703018 ] Apache Spark commented on SPARK-42864: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40500 > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42864: Assignee: Apache Spark (was: Ruifeng Zheng) > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42864: Assignee: Ruifeng Zheng (was: Apache Spark) > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42864: - Assignee: Ruifeng Zheng > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42875) Fix toPandas to handle timezone and map types properly.
[ https://issues.apache.org/jira/browse/SPARK-42875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42875. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40497 [https://github.com/apache/spark/pull/40497] > Fix toPandas to handle timezone and map types properly. > --- > > Key: SPARK-42875 > URL: https://issues.apache.org/jira/browse/SPARK-42875 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42875) Fix toPandas to handle timezone and map types properly.
[ https://issues.apache.org/jira/browse/SPARK-42875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42875: - Assignee: Takuya Ueshin > Fix toPandas to handle timezone and map types properly. > --- > > Key: SPARK-42875 > URL: https://issues.apache.org/jira/browse/SPARK-42875 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42877) Implement DataFrame.foreach
Xinrong Meng created SPARK-42877: Summary: Implement DataFrame.foreach Key: SPARK-42877 URL: https://issues.apache.org/jira/browse/SPARK-42877 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 3.5.0 Reporter: Xinrong Meng Maybe we can leverage UDFs to implement that, such as `df.select(udf(*df.schema)).count()`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42876) DataType's physicalDataType should be private[sql]
[ https://issues.apache.org/jira/browse/SPARK-42876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42876: Assignee: Rui Wang (was: Apache Spark) > DataType's physicalDataType should be private[sql] > -- > > Key: SPARK-42876 > URL: https://issues.apache.org/jira/browse/SPARK-42876 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42876) DataType's physicalDataType should be private[sql]
[ https://issues.apache.org/jira/browse/SPARK-42876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42876: Assignee: Apache Spark (was: Rui Wang) > DataType's physicalDataType should be private[sql] > -- > > Key: SPARK-42876 > URL: https://issues.apache.org/jira/browse/SPARK-42876 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42876) DataType's physicalDataType should be private[sql]
[ https://issues.apache.org/jira/browse/SPARK-42876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702978#comment-17702978 ] Apache Spark commented on SPARK-42876: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40499 > DataType's physicalDataType should be private[sql] > -- > > Key: SPARK-42876 > URL: https://issues.apache.org/jira/browse/SPARK-42876 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42876) DataType's physicalDataType should be private[sql]
Rui Wang created SPARK-42876: Summary: DataType's physicalDataType should be private[sql] Key: SPARK-42876 URL: https://issues.apache.org/jira/browse/SPARK-42876 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.4.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page
[ https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yian Liou updated SPARK-42829: -- Attachment: Screen Shot 2023-03-20 at 3.55.40 PM.png > Added Identifier to the cached RDD operator on the Stages page > --- > > Key: SPARK-42829 > URL: https://issues.apache.org/jira/browse/SPARK-42829 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Priority: Major > Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png > > > On the stages page in the Web UI, there is no distinction for which cached > RDD is being executed in a particular stage. This Jira aims to add an repeat > identifier to distinguish which cached RDD is being executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42411) Better support for Istio service mesh while running Spark on Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-42411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702929#comment-17702929 ] Puneet commented on SPARK-42411: Should be able to create a PR hopefully by next week. > Better support for Istio service mesh while running Spark on Kubernetes > --- > > Key: SPARK-42411 > URL: https://issues.apache.org/jira/browse/SPARK-42411 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.2.3 >Reporter: Puneet >Priority: Major > > h3. Support for Strict MTLS > In strict MTLS Peer Authentication Istio requires each pod to be associated > with a service identity (as this allows listeners to use the correct cert and > chain). Without the service identity communication goes through passthrough > cluster which is not permitted in strict mode. Community is still > investigating communication through IPs with strict MTLS > [https://github.com/istio/istio/issues/37431#issuecomment-1412831780]. Today > Spark backend creates a service record for driver however executor pods > register with driver using their Pod IPs. In this model therefore, TLS > handshake would fail between driver and executor and also between executors. > As part of this Jira we want to similarly add service records for the > executor pods as well. This can be achieved by adding a > ExecutorServiceFeatureStep similar to existing DriverServiceFeatureStep > h3. Allowing binding to all IPs > Before Istio 1.10 the istio-proxy sidecar was forwarding outside traffic to > localhost of the pod. Thus if the application container is binding only to > Pod IP the traffic would not be forwarded to it. This was addressed in 1.10 > [https://istio.io/latest/blog/2021/upcoming-networking-changes]. However the > old behavior is still accessible through disabling the feature flag > PILOT_ENABLE_INBOUND_PASSTHROUGH. Request to remove it has had some push back > [https://github.com/istio/istio/issues/37642]. In current implementation > Spark K8s backend does not allow to pass bind address for driver > [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala#L35] > however as part of this Jira we want to allow passing of bind address even > in Kubernetes mode so long as the bind address is 0.0.0.0. This lets user > choose the behavior depending on the state of > PILOT_ENABLE_INBOUND_PASSTHROUGH in her Istio cluster. > h3. Better support for istio-proxy sidecar lifecycle management > In istio-enabled cluster istio-proxy sidecars would be auto-injected to > driver/executor pods. If the application is ephemeral then driver and > executor containers would exit, however istio-proxy container would continue > to run. This causes driver/executor pods to enter NotReady state. As part of > this jira we want ability to run a post stop cleanup after driver/executor > container is completed. Similarly we also want to add support for a pre start > up script, which can ensure for example that istio-sidecar is up before > executor/driver container gets started. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42875) Fix toPandas to handle timezone and map types properly.
[ https://issues.apache.org/jira/browse/SPARK-42875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42875: Assignee: Apache Spark > Fix toPandas to handle timezone and map types properly. > --- > > Key: SPARK-42875 > URL: https://issues.apache.org/jira/browse/SPARK-42875 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42875) Fix toPandas to handle timezone and map types properly.
[ https://issues.apache.org/jira/browse/SPARK-42875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702922#comment-17702922 ] Apache Spark commented on SPARK-42875: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40497 > Fix toPandas to handle timezone and map types properly. > --- > > Key: SPARK-42875 > URL: https://issues.apache.org/jira/browse/SPARK-42875 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42875) Fix toPandas to handle timezone and map types properly.
[ https://issues.apache.org/jira/browse/SPARK-42875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702921#comment-17702921 ] Apache Spark commented on SPARK-42875: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40497 > Fix toPandas to handle timezone and map types properly. > --- > > Key: SPARK-42875 > URL: https://issues.apache.org/jira/browse/SPARK-42875 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42875) Fix toPandas to handle timezone and map types properly.
[ https://issues.apache.org/jira/browse/SPARK-42875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42875: Assignee: (was: Apache Spark) > Fix toPandas to handle timezone and map types properly. > --- > > Key: SPARK-42875 > URL: https://issues.apache.org/jira/browse/SPARK-42875 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36180) Support TimestampNTZ type in Hive
[ https://issues.apache.org/jira/browse/SPARK-36180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-36180. Resolution: Won't Fix > Support TimestampNTZ type in Hive > - > > Key: SPARK-36180 > URL: https://issues.apache.org/jira/browse/SPARK-36180 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Kent Yao >Priority: Major > > > {code:java} > [info] Caused by: java.lang.IllegalArgumentException: Error: type expected at > the position 0 of 'timestamp_ntz:timestamp' but 'timestamp_ntz' is > found.[info] Caused by: java.lang.IllegalArgumentException: Error: type > expected at the position 0 of 'timestamp_ntz:timestamp' but 'timestamp_ntz' > is found.[info] at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:372)[info] > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:355)[info] > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:416)[info] > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:329)[info] > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:814)[info] > at > org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.extractColumnInfo(LazySerDeParameters.java:162)[info] > at > org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.(LazySerDeParameters.java:91)[info] > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initialize(LazySimpleSerDe.java:116)[info] > at > org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:54)[info] > at > org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)[info] > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:453)[info] > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:440)[info] > at > org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:281)[info] > at > org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:199)[info] > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:842)[info] > ... 63 more[info] at > org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:385)[info] > at > org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)[info] at > org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite.$anonfun$new$145(SparkMetadataOperationSuite.scala:666)[info] > at > org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite.$anonfun$new$145$adapted(SparkMetadataOperationSuite.scala:665)[info] > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.$anonfun$withMultipleConnectionJdbcStatement$4(HiveThriftServer2Suites.scala:1422)[info] > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.$anonfun$withMultipleConnectionJdbcStatement$4$adapted(HiveThriftServer2Suites.scala:1422)[info] > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)[info] > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)[info] > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)[info] at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.$anonfun$withMultipleConnectionJdbcStatement$1(HiveThriftServer2Suites.scala:1422)[info] > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.tryCaptureSysLog(HiveThriftServer2Suites.scala:1407)[info] > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.withMultipleConnectionJdbcStatement(HiveThriftServer2Suites.scala:1416)[info] > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.withJdbcStatement(HiveThriftServer2Suites.scala:1454)[info] > at > org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite.$anonfun$new$144(SparkMetadataOperationSuite.scala:665)[info] > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)[info] > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)[info] at > org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)[info] at > org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)[info] at > org.scalatest.Transformer.apply(Transformer.scala:22)[info] at > org.scalatest.Transformer.apply(Transformer.scala:20)[info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)[info] > at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190[info] at >
[jira] [Resolved] (SPARK-36045) TO_UTC_TIMESTAMP and FROM_UTC_TIMESTAMP should return TimestampNTZ
[ https://issues.apache.org/jira/browse/SPARK-36045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-36045. Resolution: Won't Do > TO_UTC_TIMESTAMP and FROM_UTC_TIMESTAMP should return TimestampNTZ > -- > > Key: SPARK-36045 > URL: https://issues.apache.org/jira/browse/SPARK-36045 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Priority: Major > > Currently, the SQL function to_utc_timestamp is confusing: it just takes the > timestamp value in the local timezone and then pretends it’s in the provided > timezone and then returns the UTC value, but the result is still treated as > local timezone! > The same issue happens in from_utc_timestamp as well. > We even tried to deprecated in the OSS community: > https://github.com/apache/spark/commit/c5e83ab92c0cb514963209dc3e70ba0e24570082 > We should make TO_UTC_TIMESTAMP and FROM_UTC_TIMESTAMP return TimestampNTZ, > which makes a lot of sense. converting the current local time to/from UTC > local time. > The functions should accept both Timestamp types: > 1. given TimestampLTZ, convert it to TimestampNTZ and continue step #2 > 2. given TimestampNTZ, convert it as to/from UTC local time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35662) Support Timestamp without time zone data type
[ https://issues.apache.org/jira/browse/SPARK-35662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702920#comment-17702920 ] Gengliang Wang commented on SPARK-35662: [~beliefer] [~ivan.sadikov] [~gurwls223] [~sarutak] [~cloud_fan] Thanks for the work! Marking this one as resolved :) [~wrschneider99] Yes it will be available in Spark 3.4.0 > Support Timestamp without time zone data type > - > > Key: SPARK-35662 > URL: https://issues.apache.org/jira/browse/SPARK-35662 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > Fix For: 3.4.0 > > > Spark SQL today supports the TIMESTAMP data type. However the semantics > provided actually match TIMESTAMP WITH LOCAL TIMEZONE as defined by Oracle. > Timestamps embedded in a SQL query or passed through JDBC are presumed to be > in session local timezone and cast to UTC before being processed. > These are desirable semantics in many cases, such as when dealing with > calendars. > In many (more) other cases, such as when dealing with log files it is > desirable that the provided timestamps not be altered. > SQL users expect that they can model either behavior and do so by using > TIMESTAMP WITHOUT TIME ZONE for time zone insensitive data and TIMESTAMP WITH > LOCAL TIME ZONE for time zone sensitive data. > Most traditional RDBMS map TIMESTAMP to TIMESTAMP WITHOUT TIME ZONE and will > be surprised to see TIMESTAMP WITH LOCAL TIME ZONE, a feature that does not > exist in the standard. > In this new feature, we will introduce TIMESTAMP WITH LOCAL TIMEZONE to > describe the existing timestamp type and add TIMESTAMP WITHOUT TIME ZONE for > standard semantic. > Using these two types will provide clarity. > We will also allow users to set the default behavior for TIMESTAMP to either > use TIMESTAMP WITH LOCAL TIME ZONE or TIMESTAMP WITHOUT TIME ZONE. > h3. Milestone 1 – Spark Timestamp equivalency ( The new Timestamp type > TimestampWithoutTZ meets or exceeds all function of the existing SQL > Timestamp): > * Add a new DataType implementation for TimestampWithoutTZ. > * Support TimestampWithoutTZ in Dataset/UDF. > * TimestampWithoutTZ literals > * TimestampWithoutTZ arithmetic(e.g. TimestampWithoutTZ - > TimestampWithoutTZ, TimestampWithoutTZ - Date) > * Datetime functions/operators: dayofweek, weekofyear, year, etc > * Cast to and from TimestampWithoutTZ, cast String/Timestamp to > TimestampWithoutTZ, cast TimestampWithoutTZ to string (pretty > printing)/Timestamp, with the SQL syntax to specify the types > * Support sorting TimestampWithoutTZ. > h3. Milestone 2 – Persistence: > * Ability to create tables of type TimestampWithoutTZ > * Ability to write to common file formats such as Parquet and JSON. > * INSERT, SELECT, UPDATE, MERGE > * Discovery > h3. Milestone 3 – Client support > * JDBC support > * Hive Thrift server > h3. Milestone 4 – PySpark and Spark R integration > * Python UDF can take and return TimestampWithoutTZ > * DataFrame support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42870) Move `toCatalystValue` to connect-common
[ https://issues.apache.org/jira/browse/SPARK-42870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42870: Assignee: Ruifeng Zheng > Move `toCatalystValue` to connect-common > > > Key: SPARK-42870 > URL: https://issues.apache.org/jira/browse/SPARK-42870 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42870) Move `toCatalystValue` to connect-common
[ https://issues.apache.org/jira/browse/SPARK-42870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42870. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40485 [https://github.com/apache/spark/pull/40485] > Move `toCatalystValue` to connect-common > > > Key: SPARK-42870 > URL: https://issues.apache.org/jira/browse/SPARK-42870 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42875) Fix toPandas to handle timezone and map types properly.
Takuya Ueshin created SPARK-42875: - Summary: Fix toPandas to handle timezone and map types properly. Key: SPARK-42875 URL: https://issues.apache.org/jira/browse/SPARK-42875 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42702) Support parameterized CTE
[ https://issues.apache.org/jira/browse/SPARK-42702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42702: -- Fix Version/s: 3.4.1 (was: 3.4.0) > Support parameterized CTE > - > > Key: SPARK-42702 > URL: https://issues.apache.org/jira/browse/SPARK-42702 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.4.1 > > > Support named parameters in named common table expressions (CTE). At the > moment, such queries failed: > {code:java} > CREATE TABLE tbl(namespace STRING) USING parquet > INSERT INTO tbl SELECT 'abc' > WITH transitions AS ( > SELECT * FROM tbl WHERE namespace = :namespace > ) SELECT * FROM transitions {code} > w/ the following error: > {code:java} > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix > `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos > 38; > 'WithCTE > :- 'CTERelationDef 0, false > : +- 'SubqueryAlias transitions > : +- 'Project [*] > : +- 'Filter (namespace#3 = parameter(namespace)) > : +- SubqueryAlias spark_catalog.default.tbl > : +- Relation spark_catalog.default.tbl[namespace#3] parquet > +- 'Project [*] > +- 'SubqueryAlias transitions > +- 'CTERelationRef 0, falseorg.apache.spark.sql.AnalysisException: > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix > `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos > 38; > 'WithCTE > :- 'CTERelationDef 0, false > : +- 'SubqueryAlias transitions > : +- 'Project [*] > : +- 'Filter (namespace#3 = parameter(namespace)) > : +- SubqueryAlias spark_catalog.default.tbl > : +- Relation spark_catalog.default.tbl[namespace#3] parquet > +- 'Project [*] > +- 'SubqueryAlias transitions > +- 'CTERelationRef 0, false at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:339) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:244) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42702) Support parameterized CTE
[ https://issues.apache.org/jira/browse/SPARK-42702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702898#comment-17702898 ] Dongjoon Hyun commented on SPARK-42702: --- I changed the Fixed Version to 3.4.1 because there is no Apache Spark 3.4.0 RC yet with this patch. > Support parameterized CTE > - > > Key: SPARK-42702 > URL: https://issues.apache.org/jira/browse/SPARK-42702 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.4.1 > > > Support named parameters in named common table expressions (CTE). At the > moment, such queries failed: > {code:java} > CREATE TABLE tbl(namespace STRING) USING parquet > INSERT INTO tbl SELECT 'abc' > WITH transitions AS ( > SELECT * FROM tbl WHERE namespace = :namespace > ) SELECT * FROM transitions {code} > w/ the following error: > {code:java} > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix > `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos > 38; > 'WithCTE > :- 'CTERelationDef 0, false > : +- 'SubqueryAlias transitions > : +- 'Project [*] > : +- 'Filter (namespace#3 = parameter(namespace)) > : +- SubqueryAlias spark_catalog.default.tbl > : +- Relation spark_catalog.default.tbl[namespace#3] parquet > +- 'Project [*] > +- 'SubqueryAlias transitions > +- 'CTERelationRef 0, falseorg.apache.spark.sql.AnalysisException: > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix > `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos > 38; > 'WithCTE > :- 'CTERelationDef 0, false > : +- 'SubqueryAlias transitions > : +- 'Project [*] > : +- 'Filter (namespace#3 = parameter(namespace)) > : +- SubqueryAlias spark_catalog.default.tbl > : +- Relation spark_catalog.default.tbl[namespace#3] parquet > +- 'Project [*] > +- 'SubqueryAlias transitions > +- 'CTERelationRef 0, false at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:339) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:244) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42818) Implement DataFrameReader/Writer.jdbc
[ https://issues.apache.org/jira/browse/SPARK-42818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42818: -- Fix Version/s: 3.4.1 (was: 3.4.0) > Implement DataFrameReader/Writer.jdbc > - > > Key: SPARK-42818 > URL: https://issues.apache.org/jira/browse/SPARK-42818 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42767) Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive
[ https://issues.apache.org/jira/browse/SPARK-42767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42767: -- Fix Version/s: 3.4.1 (was: 3.4.0) > Add check condition to start connect server fallback with `in-memory` and > auto ignored some tests strongly depend on hive > - > > Key: SPARK-42767 > URL: https://issues.apache.org/jira/browse/SPARK-42767 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42812) client_type is missing from AddArtifactsRequest proto message
[ https://issues.apache.org/jira/browse/SPARK-42812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42812. --- Fix Version/s: 3.4.1 Assignee: Venkata Sai Akhil Gudesa Resolution: Fixed > client_type is missing from AddArtifactsRequest proto message > - > > Key: SPARK-42812 > URL: https://issues.apache.org/jira/browse/SPARK-42812 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Venkata Sai Akhil Gudesa >Priority: Major > Fix For: 3.4.1 > > > The client_type is missing from AddArtifactsRequest proto message -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42817) Spark driver logs are filled with Initializing service data for shuffle service using name
[ https://issues.apache.org/jira/browse/SPARK-42817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42817: -- Fix Version/s: 3.4.1 (was: 3.4.0) > Spark driver logs are filled with Initializing service data for shuffle > service using name > -- > > Key: SPARK-42817 > URL: https://issues.apache.org/jira/browse/SPARK-42817 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Fix For: 3.4.1 > > > With SPARK-34828, we added the ability to make the shuffle service name > configurable and we added a log > [here|https://github.com/apache/spark/blob/8860f69455e5a722626194c4797b4b42cccd4510/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala#L118] > that will log the shuffle service name. However, this log is printed in the > driver logs whenever there is new executor launched and pollutes the log. > {code} > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > {code} > We can just log this once in the driver. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42826) Add migration notes for update to supported pandas version.
[ https://issues.apache.org/jira/browse/SPARK-42826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42826: -- Fix Version/s: 3.4.1 (was: 3.4.0) > Add migration notes for update to supported pandas version. > --- > > Key: SPARK-42826 > URL: https://issues.apache.org/jira/browse/SPARK-42826 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.1 > > > We deprecate & remove some APIs from > https://issues.apache.org/jira/browse/SPARK-42593. to follow the pandas. > We should mention this in migration guide. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42020) createDataFrame with UDT
[ https://issues.apache.org/jira/browse/SPARK-42020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42020: -- Fix Version/s: 3.4.1 (was: 3.4.0) > createDataFrame with UDT > > > Key: SPARK-42020 > URL: https://issues.apache.org/jira/browse/SPARK-42020 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.4.1 > > > {code} > pyspark/sql/tests/test_types.py:596 > (TypesParityTests.test_apply_schema_with_udt) > self = testMethod=test_apply_schema_with_udt> > def test_apply_schema_with_udt(self): > row = (1.0, ExamplePoint(1.0, 2.0)) > schema = StructType( > [ > StructField("label", DoubleType(), False), > StructField("point", ExamplePointUDT(), False), > ] > ) > > df = self.spark.createDataFrame([row], schema) > ../test_types.py:605: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../../connect/session.py:282: in createDataFrame > _table = pa.Table.from_pylist([dict(zip(_cols, list(item))) for item in > _data]) > pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist > ??? > pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist > ??? > pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays > ??? > pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays > ??? > pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays > ??? > pyarrow/array.pxi:320: in pyarrow.lib.array > ??? > pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array > ??? > pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status > ??? > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > > ??? > E pyarrow.lib.ArrowInvalid: Could not convert ExamplePoint(1.0,2.0) with > type ExamplePoint: did not recognize Python value type when inferring an > Arrow data type > pyarrow/error.pxi:100: ArrowInvalid > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41843) Implement SparkSession.udf
[ https://issues.apache.org/jira/browse/SPARK-41843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-41843: -- Fix Version/s: 3.4.1 (was: 3.4.0) > Implement SparkSession.udf > -- > > Key: SPARK-41843 > URL: https://issues.apache.org/jira/browse/SPARK-41843 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > Fix For: 3.4.1 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 2331, in pyspark.sql.connect.functions.call_udf > Failed example: > _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType()) > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType()) > AttributeError: 'SparkSession' object has no attribute 'udf'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42848) Implement DataFrame.registerTempTable
[ https://issues.apache.org/jira/browse/SPARK-42848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42848: -- Fix Version/s: 3.4.1 (was: 3.4.0) > Implement DataFrame.registerTempTable > - > > Key: SPARK-42848 > URL: https://issues.apache.org/jira/browse/SPARK-42848 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41818) Support DataFrameWriter.saveAsTable
[ https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-41818: -- Fix Version/s: 3.4.1 (was: 3.4.0) > Support DataFrameWriter.saveAsTable > --- > > Key: SPARK-41818 > URL: https://issues.apache.org/jira/browse/SPARK-41818 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.4.1 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto > Failed example: > df.write.saveAsTable("tblA") > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in > > df.write.saveAsTable("tblA") > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 350, in saveAsTable > > self._spark.client.execute_command(self._write.command(self._spark.client)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 459, in execute_command > self._execute(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 547, in _execute > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (java.lang.ClassNotFoundException) .DefaultSource{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42824) Provide a clear error message for unsupported JVM attributes.
[ https://issues.apache.org/jira/browse/SPARK-42824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42824: -- Fix Version/s: 3.4.1 (was: 3.4.0) > Provide a clear error message for unsupported JVM attributes. > - > > Key: SPARK-42824 > URL: https://issues.apache.org/jira/browse/SPARK-42824 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.1 > > > There are attributes, such as "_jvm", that were accessible in PySpark but > cannot be accessed in Spark Connect. We need to display appropriate error > messages for these cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42778) QueryStageExec should respect supportsRowBased
[ https://issues.apache.org/jira/browse/SPARK-42778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42778: -- Fix Version/s: 3.4.1 (was: 3.4.0) > QueryStageExec should respect supportsRowBased > -- > > Key: SPARK-42778 > URL: https://issues.apache.org/jira/browse/SPARK-42778 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42247) Standardize `returnType` property of UserDefinedFunction
[ https://issues.apache.org/jira/browse/SPARK-42247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42247: -- Fix Version/s: 3.4.1 (was: 3.4.0) > Standardize `returnType` property of UserDefinedFunction > > > Key: SPARK-42247 > URL: https://issues.apache.org/jira/browse/SPARK-42247 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.4.1 > > > There are checks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42852) Revert NamedLambdaVariable related changes from EquivalentExpressions
[ https://issues.apache.org/jira/browse/SPARK-42852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42852: -- Fix Version/s: 3.4.1 (was: 3.4.0) > Revert NamedLambdaVariable related changes from EquivalentExpressions > - > > Key: SPARK-42852 > URL: https://issues.apache.org/jira/browse/SPARK-42852 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Assignee: Peter Toth >Priority: Major > Fix For: 3.4.1 > > > See discussion > https://github.com/apache/spark/pull/40473#issuecomment-1474848224 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-41585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-41585. --- Fix Version/s: 3.5.0 Target Version/s: 3.5.0 Assignee: Luca Canali Resolution: Fixed > The Spark exclude node functionality for YARN should work independently of > dynamic allocation > - > > Key: SPARK-41585 > URL: https://issues.apache.org/jira/browse/SPARK-41585 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.0.3, 3.1.3, 3.2.2, 3.3.1 >Reporter: Luca Canali >Assignee: Luca Canali >Priority: Minor > Fix For: 3.5.0 > > > The Spark exclude node functionality for Spark on YARN, introduced in > SPARK-26688, allows users to specify a list of node names that are excluded > from resource allocation. This is done using the configuration parameter: > {{spark.yarn.exclude.nodes}} > The feature currently works only for executors allocated via dynamic > allocation. To use the feature on Spark 3.3.1, for example, one may set the > configurations {{{}spark.dynamicAllocation.enabled{}}}=true, > spark.dynamicAllocation.minExecutors=0 and spark.executor.instances=0, thus > making Spark spawning executors only via dynamic allocation. > This proposes to document this behavior for the current Spark release and > also proposes an improvement of this feature by extending the scope of Spark > exclude node functionality for YARN beyond dynamic allocation, which I > believe makes it more generally useful. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41971) `toPandas` should support duplicate filed names when arrow-optimization is on
[ https://issues.apache.org/jira/browse/SPARK-41971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702892#comment-17702892 ] Niket Jain commented on SPARK-41971: Can I work on this issue? > `toPandas` should support duplicate filed names when arrow-optimization is on > - > > Key: SPARK-41971 > URL: https://issues.apache.org/jira/browse/SPARK-41971 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > > toPandas support duplicate columns name, but for a struct column, it doesnot > support duplicate field names. > {code:java} > In [27]: spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", False) > In [28]: spark.sql("select 1 v, 1 v").toPandas() > Out[28]: >v v > 0 1 1 > In [29]: spark.sql("select struct(1 v, 1 v)").toPandas() > Out[29]: > struct(1 AS v, 1 AS v) > 0 (1, 1) > In [30]: spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", True) > In [31]: spark.sql("select 1 v, 1 v").toPandas() > Out[31]: >v v > 0 1 1 > In [32]: spark.sql("select struct(1 v, 1 v)").toPandas() > /Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/pandas/conversion.py:204: > UserWarning: toPandas attempted Arrow optimization because > 'spark.sql.execution.arrow.pyspark.enabled' is set to true, but has reached > the error below and can not continue. Note that > 'spark.sql.execution.arrow.pyspark.fallback.enabled' does not have an effect > on failures in the middle of computation. > Ran out of field metadata, likely malformed > warn(msg) > --- > ArrowInvalid Traceback (most recent call last) > Cell In[32], line 1 > > 1 spark.sql("select struct(1 v, 1 v)").toPandas() > File ~/Dev/spark/python/pyspark/sql/pandas/conversion.py:143, in > PandasConversionMixin.toPandas(self) > 141 tmp_column_names = ["col_{}".format(i) for i in > range(len(self.columns))] > 142 self_destruct = jconf.arrowPySparkSelfDestructEnabled() > --> 143 batches = self.toDF(*tmp_column_names)._collect_as_arrow( > 144 split_batches=self_destruct > 145 ) > 146 if len(batches) > 0: > 147 table = pyarrow.Table.from_batches(batches) > File ~/Dev/spark/python/pyspark/sql/pandas/conversion.py:358, in > PandasConversionMixin._collect_as_arrow(self, split_batches) > 356 results.append(batch_or_indices) > 357 else: > --> 358 results = list(batch_stream) > 359 finally: > 360 # Join serving thread and raise any exceptions from > collectAsArrowToPython > 361 jsocket_auth_server.getResult() > File ~/Dev/spark/python/pyspark/sql/pandas/serializers.py:55, in > ArrowCollectSerializer.load_stream(self, stream) > 50 """ > 51 Load a stream of un-ordered Arrow RecordBatches, where the last > iteration yields > 52 a list of indices that can be used to put the RecordBatches in the > correct order. > 53 """ > 54 # load the batches > ---> 55 for batch in self.serializer.load_stream(stream): > 56 yield batch > 58 # load the batch order indices or propagate any error that occurred > in the JVM > File ~/Dev/spark/python/pyspark/sql/pandas/serializers.py:98, in > ArrowStreamSerializer.load_stream(self, stream) > 95 import pyarrow as pa > 97 reader = pa.ipc.open_stream(stream) > ---> 98 for batch in reader: > 99 yield batch > File > ~/.dev/miniconda3/envs/spark_dev/lib/python3.9/site-packages/pyarrow/ipc.pxi:638, > in __iter__() > File > ~/.dev/miniconda3/envs/spark_dev/lib/python3.9/site-packages/pyarrow/ipc.pxi:674, > in pyarrow.lib.RecordBatchReader.read_next_batch() > File > ~/.dev/miniconda3/envs/spark_dev/lib/python3.9/site-packages/pyarrow/error.pxi:100, > in pyarrow.lib.check_status() > ArrowInvalid: Ran out of field metadata, likely malformed > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42874) Enable new golden file test framework for analysis for all input files
[ https://issues.apache.org/jira/browse/SPARK-42874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702888#comment-17702888 ] Apache Spark commented on SPARK-42874: -- User 'dtenedor' has created a pull request for this issue: https://github.com/apache/spark/pull/40496 > Enable new golden file test framework for analysis for all input files > -- > > Key: SPARK-42874 > URL: https://issues.apache.org/jira/browse/SPARK-42874 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42874) Enable new golden file test framework for analysis for all input files
[ https://issues.apache.org/jira/browse/SPARK-42874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42874: Assignee: Apache Spark > Enable new golden file test framework for analysis for all input files > -- > > Key: SPARK-42874 > URL: https://issues.apache.org/jira/browse/SPARK-42874 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42874) Enable new golden file test framework for analysis for all input files
[ https://issues.apache.org/jira/browse/SPARK-42874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42874: Assignee: (was: Apache Spark) > Enable new golden file test framework for analysis for all input files > -- > > Key: SPARK-42874 > URL: https://issues.apache.org/jira/browse/SPARK-42874 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42874) Enable new golden file test framework for analysis for all input files
Daniel created SPARK-42874: -- Summary: Enable new golden file test framework for analysis for all input files Key: SPARK-42874 URL: https://issues.apache.org/jira/browse/SPARK-42874 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Daniel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35662) Support Timestamp without time zone data type
[ https://issues.apache.org/jira/browse/SPARK-35662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-35662. Fix Version/s: 3.4.0 Resolution: Fixed > Support Timestamp without time zone data type > - > > Key: SPARK-35662 > URL: https://issues.apache.org/jira/browse/SPARK-35662 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > Fix For: 3.4.0 > > > Spark SQL today supports the TIMESTAMP data type. However the semantics > provided actually match TIMESTAMP WITH LOCAL TIMEZONE as defined by Oracle. > Timestamps embedded in a SQL query or passed through JDBC are presumed to be > in session local timezone and cast to UTC before being processed. > These are desirable semantics in many cases, such as when dealing with > calendars. > In many (more) other cases, such as when dealing with log files it is > desirable that the provided timestamps not be altered. > SQL users expect that they can model either behavior and do so by using > TIMESTAMP WITHOUT TIME ZONE for time zone insensitive data and TIMESTAMP WITH > LOCAL TIME ZONE for time zone sensitive data. > Most traditional RDBMS map TIMESTAMP to TIMESTAMP WITHOUT TIME ZONE and will > be surprised to see TIMESTAMP WITH LOCAL TIME ZONE, a feature that does not > exist in the standard. > In this new feature, we will introduce TIMESTAMP WITH LOCAL TIMEZONE to > describe the existing timestamp type and add TIMESTAMP WITHOUT TIME ZONE for > standard semantic. > Using these two types will provide clarity. > We will also allow users to set the default behavior for TIMESTAMP to either > use TIMESTAMP WITH LOCAL TIME ZONE or TIMESTAMP WITHOUT TIME ZONE. > h3. Milestone 1 – Spark Timestamp equivalency ( The new Timestamp type > TimestampWithoutTZ meets or exceeds all function of the existing SQL > Timestamp): > * Add a new DataType implementation for TimestampWithoutTZ. > * Support TimestampWithoutTZ in Dataset/UDF. > * TimestampWithoutTZ literals > * TimestampWithoutTZ arithmetic(e.g. TimestampWithoutTZ - > TimestampWithoutTZ, TimestampWithoutTZ - Date) > * Datetime functions/operators: dayofweek, weekofyear, year, etc > * Cast to and from TimestampWithoutTZ, cast String/Timestamp to > TimestampWithoutTZ, cast TimestampWithoutTZ to string (pretty > printing)/Timestamp, with the SQL syntax to specify the types > * Support sorting TimestampWithoutTZ. > h3. Milestone 2 – Persistence: > * Ability to create tables of type TimestampWithoutTZ > * Ability to write to common file formats such as Parquet and JSON. > * INSERT, SELECT, UPDATE, MERGE > * Discovery > h3. Milestone 3 – Client support > * JDBC support > * Hive Thrift server > h3. Milestone 4 – PySpark and Spark R integration > * Python UDF can take and return TimestampWithoutTZ > * DataFrame support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003
[ https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42839: Assignee: (was: Apache Spark) > Assign a name to the error class _LEGACY_ERROR_TEMP_2003 > > > Key: SPARK-42839 > URL: https://issues.apache.org/jira/browse/SPARK-42839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > Attachments: Screenshot from 2023-03-21 00-20-11.png > > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003
[ https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42839: Assignee: Apache Spark > Assign a name to the error class _LEGACY_ERROR_TEMP_2003 > > > Key: SPARK-42839 > URL: https://issues.apache.org/jira/browse/SPARK-42839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: starter > Attachments: Screenshot from 2023-03-21 00-20-11.png > > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003
[ https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702841#comment-17702841 ] Apache Spark commented on SPARK-42839: -- User 'ruilibuaa' has created a pull request for this issue: https://github.com/apache/spark/pull/40493 > Assign a name to the error class _LEGACY_ERROR_TEMP_2003 > > > Key: SPARK-42839 > URL: https://issues.apache.org/jira/browse/SPARK-42839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > Attachments: Screenshot from 2023-03-21 00-20-11.png > > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42773) Minor grammatical change to "Supports Spark Connect" message
[ https://issues.apache.org/jira/browse/SPARK-42773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-42773: - Priority: Trivial (was: Major) > Minor grammatical change to "Supports Spark Connect" message > > > Key: SPARK-42773 > URL: https://issues.apache.org/jira/browse/SPARK-42773 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Allan Folting >Assignee: Allan Folting >Priority: Trivial > Fix For: 3.4.1 > > > Changing "Support Spark Connect" to "Supports Spark Connect" in the 3.4.0 > version change message which is also used in the documentation: > > .. versionchanged:: 3.4.0 > Supports Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003
[ https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI RUI updated SPARK-42839: --- Attachment: Screenshot from 2023-03-21 00-20-11.png > Assign a name to the error class _LEGACY_ERROR_TEMP_2003 > > > Key: SPARK-42839 > URL: https://issues.apache.org/jira/browse/SPARK-42839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > Attachments: Screenshot from 2023-03-21 00-20-11.png > > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003
[ https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702838#comment-17702838 ] LI RUI commented on SPARK-42839: Hey, Max~ I am trying to complete this task. I have submitted a commit on GitHub where I made the following changes: 1) I replaced "_LEGACY_ERROR_TEMP_2003" with "CANNOT_ZIP_MAPS". 2) I created a new test case where I attempted to use checkError() and added a new exception definition in AlreadyExistException.scala. However, I found that instead of throwing an AnalysisException, it was throwing a SparkException. So, I switched to using assert instead. I'm not sure if this is the correct approach, could you please provide some guidance? The results of the execution are attached. > Assign a name to the error class _LEGACY_ERROR_TEMP_2003 > > > Key: SPARK-42839 > URL: https://issues.apache.org/jira/browse/SPARK-42839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > Attachments: Screenshot from 2023-03-21 00-20-11.png > > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42873) Define Spark SQL types as keywords
Max Gekk created SPARK-42873: Summary: Define Spark SQL types as keywords Key: SPARK-42873 URL: https://issues.apache.org/jira/browse/SPARK-42873 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Assignee: Max Gekk Currently, Spark SQL defines primitive types as: {code} | identifier (LEFT_PAREN INTEGER_VALUE (COMMA INTEGER_VALUE)* RIGHT_PAREN)? #primitiveDataType {code} where identifier is parsed later by visitPrimitiveDataType(): {code:scala} override def visitPrimitiveDataType(ctx: PrimitiveDataTypeContext): DataType = withOrigin(ctx) { val dataType = ctx.identifier.getText.toLowerCase(Locale.ROOT) (dataType, ctx.INTEGER_VALUE().asScala.toList) match { case ("boolean", Nil) => BooleanType case ("tinyint" | "byte", Nil) => ByteType case ("smallint" | "short", Nil) => ShortType case ("int" | "integer", Nil) => IntegerType case ("bigint" | "long", Nil) => LongType case ("float" | "real", Nil) => FloatType ... {code} So, the types are not Spark SQL keywords, and this causes some inconveniences while analysing/transforming the lexer tree. For example, while forming the stable column aliases. Need to define Spark SQL types in SqlBaseLexer.g4. Also, typed literals have the same issue. The types "DATE", "TIMESTAMP_NTZ", "TIMESTAMP", "TIMESTAMP_LTZ", "INTERVAL", and "X" should be defined as base lexer tokens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42791) Create golden file test framework for analysis
[ https://issues.apache.org/jira/browse/SPARK-42791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702811#comment-17702811 ] Apache Spark commented on SPARK-42791: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40492 > Create golden file test framework for analysis > -- > > Key: SPARK-42791 > URL: https://issues.apache.org/jira/browse/SPARK-42791 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Fix For: 3.5.0 > > > Here we track the work to add new golden file test support for the Spark > analyzer. Each golden file can contain a list of SQL queries followed by the > string representations of their analyzed logical plans. > > This can be similar to Spark's existing `SQLQueryTestSuite` [1], but stopping > after analysis and listing analyzed plans as the results instead of fully > executing queries end-to-end. As another example, ZetaSQL has analyzer-based > golden file testing like this as well [2]. > > This way, any changes to analysis will show up as test diffs, which are easy > to spot in review and also easy to update automatically. This could help the > community together maintain the qualify of Apache Spark's query analysis. > > [1] > [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala] > > [2] > [https://github.com/google/zetasql/blob/master/zetasql/analyzer/testdata/limit.test]. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42872) Spark SQL reads unnecessary nested fields
Jiri Humpolicek created SPARK-42872: --- Summary: Spark SQL reads unnecessary nested fields Key: SPARK-42872 URL: https://issues.apache.org/jira/browse/SPARK-42872 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.2 Reporter: Jiri Humpolicek When we use high order functions in spark sql query, it would be great if it will be somehow possible to write following example in way that spark will read only necessary nested fields. Example: 1) Loading data {code:scala} val jsonStr = """{ "items": [ {"itemId": 1, "itemData": "a"}, {"itemId": 2, "itemData": "b"} ] }""" val df = spark.read.json(Seq(jsonStr).toDS) df.write.format("parquet").mode("overwrite").saveAsTable("persisted") {code} 2) read query with explain {code:scala} val read = spark.table("persisted") spark.conf.set("spark.sql.optimizer.nestedSchemaPruning.enabled", true) read.select(transform($"items", i=>i.getItem("itemId")).as('itemIds)).explain(true) // ReadSchema: struct>> {code} We use only *itemId* field from structure in array, but read schema contains all fields of structure. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42790) Abstract the excluded method for better test for JDBC docker tests.
[ https://issues.apache.org/jira/browse/SPARK-42790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-42790. -- Fix Version/s: 3.5.0 Assignee: jiaan.geng Resolution: Fixed Resolved by https://github.com/apache/spark/pull/40418 > Abstract the excluded method for better test for JDBC docker tests. > --- > > Key: SPARK-42790 > URL: https://issues.apache.org/jira/browse/SPARK-42790 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace
[ https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41006: Assignee: Apache Spark > ConfigMap has the same name when launching two pods on the same namespace > - > > Key: SPARK-41006 > URL: https://issues.apache.org/jira/browse/SPARK-41006 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Eric >Assignee: Apache Spark >Priority: Minor > > If we use the Spark Launcher to launch our spark apps in k8s: > {code:java} > val sparkLauncher = new InProcessLauncher() > .setMaster(k8sMaster) > .setDeployMode(deployMode) > .setAppName(appName) > .setVerbose(true) > sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code} > We have an issue when we launch another spark driver in the same namespace > where other spark app was running: > {code:java} > kp -n audit-exporter-eee5073aac -w > NAME READY STATUS RESTARTS AGE > audit-exporter-71489e843d8085c0-driver 1/1 Running 0 > 9m54s > audit-exporter-7e6b8b843d80b9e6-exec-1 1/1 Running 0 > 9m40s > data-io-120204843d899567-driver 0/1 Terminating 0 1s > data-io-120204843d899567-driver 0/1 Terminating 0 2s > data-io-120204843d899567-driver 0/1 Terminating 0 3s > data-io-120204843d899567-driver 0/1 Terminating 0 > 3s{code} > The error is: > {code:java} > {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38: > 'data-io'","msg":"Application failed with > exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException: > Failure executing: PUT at: > https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map. > Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: > Forbidden: field is immutable when `immutable` is set. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: > field is immutable when `immutable` is set, reason=FieldValueForbidden, > additionalProperties={})], group=null, kind=ConfigMap, > name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=ConfigMap > \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is > immutable when `immutable` is set, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}).\n\tat > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat > > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83)\n\tat > >
[jira] [Commented] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace
[ https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702710#comment-17702710 ] Apache Spark commented on SPARK-41006: -- User 'DHKold' has created a pull request for this issue: https://github.com/apache/spark/pull/40491 > ConfigMap has the same name when launching two pods on the same namespace > - > > Key: SPARK-41006 > URL: https://issues.apache.org/jira/browse/SPARK-41006 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Eric >Priority: Minor > > If we use the Spark Launcher to launch our spark apps in k8s: > {code:java} > val sparkLauncher = new InProcessLauncher() > .setMaster(k8sMaster) > .setDeployMode(deployMode) > .setAppName(appName) > .setVerbose(true) > sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code} > We have an issue when we launch another spark driver in the same namespace > where other spark app was running: > {code:java} > kp -n audit-exporter-eee5073aac -w > NAME READY STATUS RESTARTS AGE > audit-exporter-71489e843d8085c0-driver 1/1 Running 0 > 9m54s > audit-exporter-7e6b8b843d80b9e6-exec-1 1/1 Running 0 > 9m40s > data-io-120204843d899567-driver 0/1 Terminating 0 1s > data-io-120204843d899567-driver 0/1 Terminating 0 2s > data-io-120204843d899567-driver 0/1 Terminating 0 3s > data-io-120204843d899567-driver 0/1 Terminating 0 > 3s{code} > The error is: > {code:java} > {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38: > 'data-io'","msg":"Application failed with > exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException: > Failure executing: PUT at: > https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map. > Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: > Forbidden: field is immutable when `immutable` is set. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: > field is immutable when `immutable` is set, reason=FieldValueForbidden, > additionalProperties={})], group=null, kind=ConfigMap, > name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=ConfigMap > \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is > immutable when `immutable` is set, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}).\n\tat > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat > > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83)\n\tat > >
[jira] [Assigned] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace
[ https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41006: Assignee: (was: Apache Spark) > ConfigMap has the same name when launching two pods on the same namespace > - > > Key: SPARK-41006 > URL: https://issues.apache.org/jira/browse/SPARK-41006 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Eric >Priority: Minor > > If we use the Spark Launcher to launch our spark apps in k8s: > {code:java} > val sparkLauncher = new InProcessLauncher() > .setMaster(k8sMaster) > .setDeployMode(deployMode) > .setAppName(appName) > .setVerbose(true) > sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code} > We have an issue when we launch another spark driver in the same namespace > where other spark app was running: > {code:java} > kp -n audit-exporter-eee5073aac -w > NAME READY STATUS RESTARTS AGE > audit-exporter-71489e843d8085c0-driver 1/1 Running 0 > 9m54s > audit-exporter-7e6b8b843d80b9e6-exec-1 1/1 Running 0 > 9m40s > data-io-120204843d899567-driver 0/1 Terminating 0 1s > data-io-120204843d899567-driver 0/1 Terminating 0 2s > data-io-120204843d899567-driver 0/1 Terminating 0 3s > data-io-120204843d899567-driver 0/1 Terminating 0 > 3s{code} > The error is: > {code:java} > {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38: > 'data-io'","msg":"Application failed with > exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException: > Failure executing: PUT at: > https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map. > Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: > Forbidden: field is immutable when `immutable` is set. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: > field is immutable when `immutable` is set, reason=FieldValueForbidden, > additionalProperties={})], group=null, kind=ConfigMap, > name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=ConfigMap > \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is > immutable when `immutable` is set, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=Invalid, status=Failure, > additionalProperties={}).\n\tat > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown > Source)\n\tat > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat > > io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83)\n\tat > >
[jira] [Assigned] (SPARK-42536) Upgrade log4j2 to 2.20.0
[ https://issues.apache.org/jira/browse/SPARK-42536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42536: Assignee: (was: Apache Spark) > Upgrade log4j2 to 2.20.0 > > > Key: SPARK-42536 > URL: https://issues.apache.org/jira/browse/SPARK-42536 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > [https://logging.apache.org/log4j/2.x/release-notes/2.20.0.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42536) Upgrade log4j2 to 2.20.0
[ https://issues.apache.org/jira/browse/SPARK-42536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42536: Assignee: Apache Spark > Upgrade log4j2 to 2.20.0 > > > Key: SPARK-42536 > URL: https://issues.apache.org/jira/browse/SPARK-42536 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > [https://logging.apache.org/log4j/2.x/release-notes/2.20.0.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42536) Upgrade log4j2 to 2.20.0
[ https://issues.apache.org/jira/browse/SPARK-42536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702667#comment-17702667 ] Apache Spark commented on SPARK-42536: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40490 > Upgrade log4j2 to 2.20.0 > > > Key: SPARK-42536 > URL: https://issues.apache.org/jira/browse/SPARK-42536 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > [https://logging.apache.org/log4j/2.x/release-notes/2.20.0.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42871) Upgrade slf4j to 2.0.7
[ https://issues.apache.org/jira/browse/SPARK-42871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702662#comment-17702662 ] Apache Spark commented on SPARK-42871: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40489 > Upgrade slf4j to 2.0.7 > -- > > Key: SPARK-42871 > URL: https://issues.apache.org/jira/browse/SPARK-42871 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > https://www.slf4j.org/news.html#2.0.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42536) Upgrade log4j2 to 2.20.0
[ https://issues.apache.org/jira/browse/SPARK-42536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42536: - Description: [https://logging.apache.org/log4j/2.x/release-notes/2.20.0.html] (was: Need wait upgrade slf4j 2.0.7 first * [https://logging.apache.org/log4j/2.x/release-notes/2.20.0.html] * https://jira.qos.ch/browse/SLF4J-511) > Upgrade log4j2 to 2.20.0 > > > Key: SPARK-42536 > URL: https://issues.apache.org/jira/browse/SPARK-42536 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > [https://logging.apache.org/log4j/2.x/release-notes/2.20.0.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42871) Upgrade slf4j to 2.0.7
[ https://issues.apache.org/jira/browse/SPARK-42871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42871: Assignee: (was: Apache Spark) > Upgrade slf4j to 2.0.7 > -- > > Key: SPARK-42871 > URL: https://issues.apache.org/jira/browse/SPARK-42871 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > https://www.slf4j.org/news.html#2.0.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42871) Upgrade slf4j to 2.0.7
[ https://issues.apache.org/jira/browse/SPARK-42871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42871: Assignee: Apache Spark > Upgrade slf4j to 2.0.7 > -- > > Key: SPARK-42871 > URL: https://issues.apache.org/jira/browse/SPARK-42871 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > https://www.slf4j.org/news.html#2.0.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42871) Upgrade slf4j to 2.0.7
[ https://issues.apache.org/jira/browse/SPARK-42871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702661#comment-17702661 ] Apache Spark commented on SPARK-42871: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40489 > Upgrade slf4j to 2.0.7 > -- > > Key: SPARK-42871 > URL: https://issues.apache.org/jira/browse/SPARK-42871 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > https://www.slf4j.org/news.html#2.0.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42871) Upgrade slf4j to 2.0.7
Yang Jie created SPARK-42871: Summary: Upgrade slf4j to 2.0.7 Key: SPARK-42871 URL: https://issues.apache.org/jira/browse/SPARK-42871 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: Yang Jie https://www.slf4j.org/news.html#2.0.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression
[ https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702658#comment-17702658 ] Apache Spark commented on SPARK-42851: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/40488 > EquivalentExpressions methods need to be consistently guarded by > supportedExpression > > > Key: SPARK-42851 > URL: https://issues.apache.org/jira/browse/SPARK-42851 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.0 >Reporter: Kris Mok >Priority: Major > > SPARK-41468 tried to fix a bug but introduced a new regression. Its change to > {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the > {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same > guard to the other "add" entry point -- {{addExpr()}}. > As such, uses that add single expressions to CSE via {{addExpr()}} may > succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a > {{None}} due to failing the guard. > We need to make sure the "add" and "get" methods are consistent. It could be > done by one of: > 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or > 2. Removing the guard from {{getExprState()}}, relying solely on the guard on > the "add" path to make sure only intended state is added. > (or other alternative refactorings to fuse the guard into various methods to > make it more efficient) > There are pros and cons to the two directions above, because {{addExpr()}} > used to allow (potentially incorrect) more expressions to get CSE'd, making > it more restrictive may cause performance regressions (for the cases that > happened to work). > Example: > {code:sql} > select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) > from range(2) > {code} > Running this query on Spark 3.2 branch returns the correct value: > {code} > scala> spark.sql("select max(transform(array(id), x -> x)), > max(transform(array(id), x -> x)) from range(2)").collect > res0: Array[org.apache.spark.sql.Row] = > Array([WrappedArray(1),WrappedArray(1)]) > {code} > Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was > (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, > and {{getExprState()}} doesn't do extra guarding, so during physical > planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the > aggregation expression list and the result expressions list. > {code} > AdaptiveSparkPlan isFinalPlan=false > +- SortAggregate(key=[], functions=[max(transform(array(id#0L), > lambdafunction(lambda x#1L, lambda x#1L, false)))]) >+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11] > +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), > lambdafunction(lambda x#1L, lambda x#1L, false)))]) > +- Range (0, 2, step=1, splits=16) > {code} > Running the same query on current master triggers an error when binding the > result expression to the aggregate expression in the Aggregate operators (for > a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show > up during codegen): > {code} > ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 > (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): > java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), > lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in > [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, > false)))#3] > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532) > at >
[jira] [Commented] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression
[ https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702657#comment-17702657 ] Apache Spark commented on SPARK-42851: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/40488 > EquivalentExpressions methods need to be consistently guarded by > supportedExpression > > > Key: SPARK-42851 > URL: https://issues.apache.org/jira/browse/SPARK-42851 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.0 >Reporter: Kris Mok >Priority: Major > > SPARK-41468 tried to fix a bug but introduced a new regression. Its change to > {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the > {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same > guard to the other "add" entry point -- {{addExpr()}}. > As such, uses that add single expressions to CSE via {{addExpr()}} may > succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a > {{None}} due to failing the guard. > We need to make sure the "add" and "get" methods are consistent. It could be > done by one of: > 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or > 2. Removing the guard from {{getExprState()}}, relying solely on the guard on > the "add" path to make sure only intended state is added. > (or other alternative refactorings to fuse the guard into various methods to > make it more efficient) > There are pros and cons to the two directions above, because {{addExpr()}} > used to allow (potentially incorrect) more expressions to get CSE'd, making > it more restrictive may cause performance regressions (for the cases that > happened to work). > Example: > {code:sql} > select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) > from range(2) > {code} > Running this query on Spark 3.2 branch returns the correct value: > {code} > scala> spark.sql("select max(transform(array(id), x -> x)), > max(transform(array(id), x -> x)) from range(2)").collect > res0: Array[org.apache.spark.sql.Row] = > Array([WrappedArray(1),WrappedArray(1)]) > {code} > Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was > (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, > and {{getExprState()}} doesn't do extra guarding, so during physical > planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the > aggregation expression list and the result expressions list. > {code} > AdaptiveSparkPlan isFinalPlan=false > +- SortAggregate(key=[], functions=[max(transform(array(id#0L), > lambdafunction(lambda x#1L, lambda x#1L, false)))]) >+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11] > +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), > lambdafunction(lambda x#1L, lambda x#1L, false)))]) > +- Range (0, 2, step=1, splits=16) > {code} > Running the same query on current master triggers an error when binding the > result expression to the aggregate expression in the Aggregate operators (for > a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show > up during codegen): > {code} > ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 > (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): > java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), > lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in > [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, > false)))#3] > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532) > at >
[jira] [Resolved] (SPARK-42720) Refactor the withSequenceColumn
[ https://issues.apache.org/jira/browse/SPARK-42720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42720. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40456 [https://github.com/apache/spark/pull/40456] > Refactor the withSequenceColumn > --- > > Key: SPARK-42720 > URL: https://issues.apache.org/jira/browse/SPARK-42720 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42720) Refactor the withSequenceColumn
[ https://issues.apache.org/jira/browse/SPARK-42720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42720: Assignee: Hyukjin Kwon > Refactor the withSequenceColumn > --- > > Key: SPARK-42720 > URL: https://issues.apache.org/jira/browse/SPARK-42720 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42791) Create golden file test framework for analysis
[ https://issues.apache.org/jira/browse/SPARK-42791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-42791: --- Assignee: Daniel > Create golden file test framework for analysis > -- > > Key: SPARK-42791 > URL: https://issues.apache.org/jira/browse/SPARK-42791 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Fix For: 3.5.0 > > > Here we track the work to add new golden file test support for the Spark > analyzer. Each golden file can contain a list of SQL queries followed by the > string representations of their analyzed logical plans. > > This can be similar to Spark's existing `SQLQueryTestSuite` [1], but stopping > after analysis and listing analyzed plans as the results instead of fully > executing queries end-to-end. As another example, ZetaSQL has analyzer-based > golden file testing like this as well [2]. > > This way, any changes to analysis will show up as test diffs, which are easy > to spot in review and also easy to update automatically. This could help the > community together maintain the qualify of Apache Spark's query analysis. > > [1] > [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala] > > [2] > [https://github.com/google/zetasql/blob/master/zetasql/analyzer/testdata/limit.test]. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42791) Create golden file test framework for analysis
[ https://issues.apache.org/jira/browse/SPARK-42791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-42791. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40449 [https://github.com/apache/spark/pull/40449] > Create golden file test framework for analysis > -- > > Key: SPARK-42791 > URL: https://issues.apache.org/jira/browse/SPARK-42791 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > Fix For: 3.5.0 > > > Here we track the work to add new golden file test support for the Spark > analyzer. Each golden file can contain a list of SQL queries followed by the > string representations of their analyzed logical plans. > > This can be similar to Spark's existing `SQLQueryTestSuite` [1], but stopping > after analysis and listing analyzed plans as the results instead of fully > executing queries end-to-end. As another example, ZetaSQL has analyzer-based > golden file testing like this as well [2]. > > This way, any changes to analysis will show up as test diffs, which are easy > to spot in review and also easy to update automatically. This could help the > community together maintain the qualify of Apache Spark's query analysis. > > [1] > [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala] > > [2] > [https://github.com/google/zetasql/blob/master/zetasql/analyzer/testdata/limit.test]. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42340) Implement GroupedData.applyInPandas
[ https://issues.apache.org/jira/browse/SPARK-42340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702632#comment-17702632 ] Apache Spark commented on SPARK-42340: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/40486 > Implement GroupedData.applyInPandas > --- > > Key: SPARK-42340 > URL: https://issues.apache.org/jira/browse/SPARK-42340 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42340) Implement GroupedData.applyInPandas
[ https://issues.apache.org/jira/browse/SPARK-42340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42340. -- Assignee: Xinrong Meng Resolution: Fixed Fixed in https://github.com/apache/spark/pull/40405 > Implement GroupedData.applyInPandas > --- > > Key: SPARK-42340 > URL: https://issues.apache.org/jira/browse/SPARK-42340 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42340) Implement GroupedData.applyInPandas
[ https://issues.apache.org/jira/browse/SPARK-42340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42340: - Fix Version/s: 3.5.0 > Implement GroupedData.applyInPandas > --- > > Key: SPARK-42340 > URL: https://issues.apache.org/jira/browse/SPARK-42340 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42870) Move `toCatalystValue` to connect-common
[ https://issues.apache.org/jira/browse/SPARK-42870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42870: Assignee: (was: Apache Spark) > Move `toCatalystValue` to connect-common > > > Key: SPARK-42870 > URL: https://issues.apache.org/jira/browse/SPARK-42870 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42870) Move `toCatalystValue` to connect-common
[ https://issues.apache.org/jira/browse/SPARK-42870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702598#comment-17702598 ] Apache Spark commented on SPARK-42870: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40485 > Move `toCatalystValue` to connect-common > > > Key: SPARK-42870 > URL: https://issues.apache.org/jira/browse/SPARK-42870 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42870) Move `toCatalystValue` to connect-common
[ https://issues.apache.org/jira/browse/SPARK-42870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42870: Assignee: Apache Spark > Move `toCatalystValue` to connect-common > > > Key: SPARK-42870 > URL: https://issues.apache.org/jira/browse/SPARK-42870 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42870) Move `toCatalystValue` to connect-common
Ruifeng Zheng created SPARK-42870: - Summary: Move `toCatalystValue` to connect-common Key: SPARK-42870 URL: https://issues.apache.org/jira/browse/SPARK-42870 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42869) can not analyze window exp on sub query
[ https://issues.apache.org/jira/browse/SPARK-42869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GuangWeiHong updated SPARK-42869: - Description: CREATE TABLE test_noindex_table(`name` STRING,`age` INT,`city` STRING) PARTITIONED BY (`date` STRING); SELECT * FROM ( SELECT *, COUNT(1) OVER itr AS grp_size FROM test_noindex_table WINDOW itr AS (PARTITION BY city) ) tbl WINDOW itr2 AS (PARTITION BY city ) Window specification itr is not defined in the WINDOW clause. !image-2023-03-20-18-00-40-578.png|width=560,height=361! was: SELECT * FROM ( SELECT *, COUNT(1) OVER itr AS grp_size FROM test WINDOW itr AS (PARTITION BY model) ) tbl WINDOW itr2 AS (PARTITION BY model ) Window specification itr is not defined in the WINDOW clause. !image-2023-03-20-18-00-40-578.png|width=560,height=361! > can not analyze window exp on sub query > --- > > Key: SPARK-42869 > URL: https://issues.apache.org/jira/browse/SPARK-42869 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: GuangWeiHong >Priority: Major > Attachments: image-2023-03-20-18-00-40-578.png > > > > CREATE TABLE test_noindex_table(`name` STRING,`age` INT,`city` STRING) > PARTITIONED BY (`date` STRING); > > SELECT > * > FROM > ( > SELECT *, COUNT(1) OVER itr AS grp_size > FROM test_noindex_table > WINDOW itr AS (PARTITION BY city) > ) tbl > WINDOW itr2 AS (PARTITION BY > city > ) > > Window specification itr is not defined in the WINDOW clause. > !image-2023-03-20-18-00-40-578.png|width=560,height=361! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42869) can not analyze window exp on sub query
[ https://issues.apache.org/jira/browse/SPARK-42869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GuangWeiHong updated SPARK-42869: - Description: SELECT * FROM ( SELECT *, COUNT(1) OVER itr AS grp_size FROM test WINDOW itr AS (PARTITION BY model) ) tbl WINDOW itr2 AS (PARTITION BY model ) Window specification itr is not defined in the WINDOW clause. !image-2023-03-20-18-00-40-578.png|width=560,height=361! was: SELECT * FROM ( SELECT *, COUNT(1) OVER itr AS grp_size FROM test WINDOW itr AS (PARTITION BY model) ) tbl WINDOW itr2 AS (PARTITION BY model ) Window specification itr is not defined in the WINDOW clause. > can not analyze window exp on sub query > --- > > Key: SPARK-42869 > URL: https://issues.apache.org/jira/browse/SPARK-42869 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: GuangWeiHong >Priority: Major > Attachments: image-2023-03-20-18-00-40-578.png > > > > SELECT * FROM ( SELECT *, COUNT(1) OVER itr AS grp_size FROM test WINDOW itr > AS (PARTITION BY model) ) tbl WINDOW itr2 AS (PARTITION BY model ) > > Window specification itr is not defined in the WINDOW clause. > !image-2023-03-20-18-00-40-578.png|width=560,height=361! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42869) can not analyze window exp on sub query
[ https://issues.apache.org/jira/browse/SPARK-42869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GuangWeiHong updated SPARK-42869: - Attachment: image-2023-03-20-18-00-40-578.png > can not analyze window exp on sub query > --- > > Key: SPARK-42869 > URL: https://issues.apache.org/jira/browse/SPARK-42869 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: GuangWeiHong >Priority: Major > Attachments: image-2023-03-20-18-00-40-578.png > > > > SELECT * FROM ( SELECT *, COUNT(1) OVER itr AS grp_size FROM test WINDOW itr > AS (PARTITION BY model) ) tbl WINDOW itr2 AS (PARTITION BY model ) > > Window specification itr is not defined in the WINDOW clause. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42869) can not analyze window exp on sub query
GuangWeiHong created SPARK-42869: Summary: can not analyze window exp on sub query Key: SPARK-42869 URL: https://issues.apache.org/jira/browse/SPARK-42869 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: GuangWeiHong SELECT * FROM ( SELECT *, COUNT(1) OVER itr AS grp_size FROM test WINDOW itr AS (PARTITION BY model) ) tbl WINDOW itr2 AS (PARTITION BY model ) Window specification itr is not defined in the WINDOW clause. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38973) When push-based shuffle is enabled, a stage may not complete when retried
[ https://issues.apache.org/jira/browse/SPARK-38973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702556#comment-17702556 ] Li Ying commented on SPARK-38973: - [~csingh] Should this bugfix be merged into 3.2.x branches? > When push-based shuffle is enabled, a stage may not complete when retried > - > > Key: SPARK-38973 > URL: https://issues.apache.org/jira/browse/SPARK-38973 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Fix For: 3.3.0 > > > With push-based shuffle enabled and adaptive merge finalization, there are > scenarios where a re-attempt of ShuffleMapStage may not complete. > With Adaptive Merge Finalization, a stage may be triggered for finalization > when it is in the below state: > # The stage is *not* running ({*}not{*} in the _running_ set of the > DAGScheduler) - had failed or canceled or waiting, and > # The stage has no pending partitions (all the tasks completed at-least > once). > For such a stage when the finalization completes, the stage will still not be > marked as {_}mergeFinalized{_}. > The stage of the stage will be: > * _stage.shuffleDependency.mergeFinalized = false_ > * _stage.shuffleDependency.getFinalizeTask = finalizeTask_ > * Merged statuses of the state are unregistered > > When the stage is resubmitted, the newer attempt of the stage will never > complete even though its tasks may be completed. This is because the newer > attempt of the stage will have {_}shuffleMergeEnabled = true{_}, since with > the previous attempt the stage was never marked as {_}mergedFinalized{_}, and > the _finalizeTask_ is present (from finalization attempt for previous stage > attempt). > > So, when all the tasks of the newer attempt complete, then these conditions > will be true: > * stage will be running > * There will be no pending partitions since all the tasks completed > * _stage.shuffleDependency.shuffleMergeEnabled = true_ > * _stage.shuffleDependency.shuffleMergeFinalized = false_ > * _stage.shuffleDependency.getFinalizeTask_ is not empty > This leads the DAGScheduler to try scheduling finalization and not trigger > the completion of the Stage. However because of the last condition it never > even schedules the finalization and the stage never completes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42868) Support eliminate sorts in AQE Optimizer
[ https://issues.apache.org/jira/browse/SPARK-42868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702545#comment-17702545 ] Apache Spark commented on SPARK-42868: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/40484 > Support eliminate sorts in AQE Optimizer > > > Key: SPARK-42868 > URL: https://issues.apache.org/jira/browse/SPARK-42868 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42868) Support eliminate sorts in AQE Optimizer
[ https://issues.apache.org/jira/browse/SPARK-42868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42868: Assignee: Apache Spark > Support eliminate sorts in AQE Optimizer > > > Key: SPARK-42868 > URL: https://issues.apache.org/jira/browse/SPARK-42868 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42868) Support eliminate sorts in AQE Optimizer
[ https://issues.apache.org/jira/browse/SPARK-42868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42868: Assignee: (was: Apache Spark) > Support eliminate sorts in AQE Optimizer > > > Key: SPARK-42868 > URL: https://issues.apache.org/jira/browse/SPARK-42868 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42852) Revert NamedLambdaVariable related changes from EquivalentExpressions
[ https://issues.apache.org/jira/browse/SPARK-42852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Toth updated SPARK-42852: --- Affects Version/s: (was: 3.3.2) > Revert NamedLambdaVariable related changes from EquivalentExpressions > - > > Key: SPARK-42852 > URL: https://issues.apache.org/jira/browse/SPARK-42852 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Assignee: Peter Toth >Priority: Major > Fix For: 3.4.0 > > > See discussion > https://github.com/apache/spark/pull/40473#issuecomment-1474848224 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42827) Support `functions#array_prepend`
[ https://issues.apache.org/jira/browse/SPARK-42827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42827. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40481 [https://github.com/apache/spark/pull/40481] > Support `functions#array_prepend` > - > > Key: SPARK-42827 > URL: https://issues.apache.org/jira/browse/SPARK-42827 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.5.0 > > > Wait for SPARK-41233 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42827) Support `functions#array_prepend`
[ https://issues.apache.org/jira/browse/SPARK-42827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42827: - Assignee: Yang Jie > Support `functions#array_prepend` > - > > Key: SPARK-42827 > URL: https://issues.apache.org/jira/browse/SPARK-42827 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > Wait for SPARK-41233 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42868) Support eliminate sorts in AQE Optimizer
Yuming Wang created SPARK-42868: --- Summary: Support eliminate sorts in AQE Optimizer Key: SPARK-42868 URL: https://issues.apache.org/jira/browse/SPARK-42868 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-42864: Assignee: (was: Ruifeng Zheng) > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org