[jira] [Commented] (SPARK-48016) Fix a bug in try_divide function when with decimals
[ https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842200#comment-17842200 ] Dongjoon Hyun commented on SPARK-48016: --- Thank you so much! > Fix a bug in try_divide function when with decimals > --- > > Key: SPARK-48016 > URL: https://issues.apache.org/jira/browse/SPARK-48016 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > Binary Arithmetic operators should include the evalMode during makeCopy. > Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of > returning null > > {code:java} > SELECT try_divide(1, decimal(0)); {code} > This is caused from the rule DecimalPrecision: > {code:java} > case b @ BinaryOperator(left, right) if left.dataType != right.dataType => > (left, right) match { > ... > case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] && > l.dataType.isInstanceOf[IntegralType] && > literalPickMinimumPrecision => > b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48016) Fix a bug in try_divide function when with decimals
[ https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842191#comment-17842191 ] Dongjoon Hyun commented on SPARK-48016: --- Hi, [~Gengliang.Wang] . - I updated the JIRA title according to the commit title. - The umbrella Jira issue is done at Apache Spark 3.4.0. To give a more visibility, shall we move this to SPARK-44111 because recent ANSI JIRA issues are there? > Fix a bug in try_divide function when with decimals > --- > > Key: SPARK-48016 > URL: https://issues.apache.org/jira/browse/SPARK-48016 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > Binary Arithmetic operators should include the evalMode during makeCopy. > Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of > returning null > > {code:java} > SELECT try_divide(1, decimal(0)); {code} > This is caused from the rule DecimalPrecision: > {code:java} > case b @ BinaryOperator(left, right) if left.dataType != right.dataType => > (left, right) match { > ... > case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] && > l.dataType.isInstanceOf[IntegralType] && > literalPickMinimumPrecision => > b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48016) Fix a bug in try_divide function when with decimals
[ https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48016: -- Summary: Fix a bug in try_divide function when with decimals (was: Binary Arithmetic operators should include the evalMode when makeCopy) > Fix a bug in try_divide function when with decimals > --- > > Key: SPARK-48016 > URL: https://issues.apache.org/jira/browse/SPARK-48016 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > Binary Arithmetic operators should include the evalMode during makeCopy. > Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of > returning null > > {code:java} > SELECT try_divide(1, decimal(0)); {code} > This is caused from the rule DecimalPrecision: > {code:java} > case b @ BinaryOperator(left, right) if left.dataType != right.dataType => > (left, right) match { > ... > case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] && > l.dataType.isInstanceOf[IntegralType] && > literalPickMinimumPrecision => > b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48042) Don't use a copy of timestamp formatter with a new override zone for each value
[ https://issues.apache.org/jira/browse/SPARK-48042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48042. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46282 [https://github.com/apache/spark/pull/46282] > Don't use a copy of timestamp formatter with a new override zone for each > value > --- > > Key: SPARK-48042 > URL: https://issues.apache.org/jira/browse/SPARK-48042 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48044) Cache `DataFrame.isStreaming`
[ https://issues.apache.org/jira/browse/SPARK-48044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48044. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46281 [https://github.com/apache/spark/pull/46281] > Cache `DataFrame.isStreaming` > - > > Key: SPARK-48044 > URL: https://issues.apache.org/jira/browse/SPARK-48044 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48046) Remove `clock` parameter from `DriverServiceFeatureStep`
[ https://issues.apache.org/jira/browse/SPARK-48046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48046. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46284 [https://github.com/apache/spark/pull/46284] > Remove `clock` parameter from `DriverServiceFeatureStep` > > > Key: SPARK-48046 > URL: https://issues.apache.org/jira/browse/SPARK-48046 > Project: Spark > Issue Type: Task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48046) Remove `clock` parameter from `DriverServiceFeatureStep`
[ https://issues.apache.org/jira/browse/SPARK-48046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48046: - Assignee: Dongjoon Hyun > Remove `clock` parameter from `DriverServiceFeatureStep` > > > Key: SPARK-48046 > URL: https://issues.apache.org/jira/browse/SPARK-48046 > Project: Spark > Issue Type: Task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48046) Remove `clock` parameter from `DriverServiceFeatureStep`
Dongjoon Hyun created SPARK-48046: - Summary: Remove `clock` parameter from `DriverServiceFeatureStep` Key: SPARK-48046 URL: https://issues.apache.org/jira/browse/SPARK-48046 Project: Spark Issue Type: Task Components: Kubernetes Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48038) Promote driverServiceName to KubernetesDriverConf
[ https://issues.apache.org/jira/browse/SPARK-48038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48038: - Assignee: Cheng Pan > Promote driverServiceName to KubernetesDriverConf > - > > Key: SPARK-48038 > URL: https://issues.apache.org/jira/browse/SPARK-48038 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48038) Promote driverServiceName to KubernetesDriverConf
[ https://issues.apache.org/jira/browse/SPARK-48038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48038. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46276 [https://github.com/apache/spark/pull/46276] > Promote driverServiceName to KubernetesDriverConf > - > > Key: SPARK-48038 > URL: https://issues.apache.org/jira/browse/SPARK-48038 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48036) Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md`
[ https://issues.apache.org/jira/browse/SPARK-48036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48036. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46271 [https://github.com/apache/spark/pull/46271] > Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md` > --- > > Key: SPARK-48036 > URL: https://issues.apache.org/jira/browse/SPARK-48036 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48029) Update the packages name removed in building the spark docker image
[ https://issues.apache.org/jira/browse/SPARK-48029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48029. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46258 [https://github.com/apache/spark/pull/46258] > Update the packages name removed in building the spark docker image > --- > > Key: SPARK-48029 > URL: https://issues.apache.org/jira/browse/SPARK-48029 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48029) Update the packages name removed in building the spark docker image
[ https://issues.apache.org/jira/browse/SPARK-48029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48029: - Assignee: BingKun Pan > Update the packages name removed in building the spark docker image > --- > > Key: SPARK-48029 > URL: https://issues.apache.org/jira/browse/SPARK-48029 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48036) Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md`
Dongjoon Hyun created SPARK-48036: - Summary: Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md` Key: SPARK-48036 URL: https://issues.apache.org/jira/browse/SPARK-48036 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48032) Upgrade `commons-codec` to 1.17.0
[ https://issues.apache.org/jira/browse/SPARK-48032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48032. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46268 [https://github.com/apache/spark/pull/46268] > Upgrade `commons-codec` to 1.17.0 > - > > Key: SPARK-48032 > URL: https://issues.apache.org/jira/browse/SPARK-48032 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47730) Support APP_ID and EXECUTOR_ID placeholder in labels
[ https://issues.apache.org/jira/browse/SPARK-47730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47730: -- Parent: SPARK-44111 Issue Type: Sub-task (was: Improvement) > Support APP_ID and EXECUTOR_ID placeholder in labels > > > Key: SPARK-47730 > URL: https://issues.apache.org/jira/browse/SPARK-47730 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: 3.5.1 >Reporter: Xi Chen >Assignee: Xi Chen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47730) Support APP_ID and EXECUTOR_ID placeholder in labels
[ https://issues.apache.org/jira/browse/SPARK-47730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47730. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46149 [https://github.com/apache/spark/pull/46149] > Support APP_ID and EXECUTOR_ID placeholder in labels > > > Key: SPARK-47730 > URL: https://issues.apache.org/jira/browse/SPARK-47730 > Project: Spark > Issue Type: Improvement > Components: k8s >Affects Versions: 3.5.1 >Reporter: Xi Chen >Assignee: Xi Chen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47730) Support APP_ID and EXECUTOR_ID placeholder in labels
[ https://issues.apache.org/jira/browse/SPARK-47730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47730: - Assignee: Xi Chen > Support APP_ID and EXECUTOR_ID placeholder in labels > > > Key: SPARK-47730 > URL: https://issues.apache.org/jira/browse/SPARK-47730 > Project: Spark > Issue Type: Improvement > Components: k8s >Affects Versions: 3.5.1 >Reporter: Xi Chen >Assignee: Xi Chen >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48021) Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`
[ https://issues.apache.org/jira/browse/SPARK-48021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48021: - Assignee: BingKun Pan > Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` > --- > > Key: SPARK-48021 > URL: https://issues.apache.org/jira/browse/SPARK-48021 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48021) Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`
[ https://issues.apache.org/jira/browse/SPARK-48021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48021. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46246 [https://github.com/apache/spark/pull/46246] > Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions` > --- > > Key: SPARK-48021 > URL: https://issues.apache.org/jira/browse/SPARK-48021 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47408) Fix mathExpressions that use StringType
[ https://issues.apache.org/jira/browse/SPARK-47408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47408. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46227 [https://github.com/apache/spark/pull/46227] > Fix mathExpressions that use StringType > --- > > Key: SPARK-47408 > URL: https://issues.apache.org/jira/browse/SPARK-47408 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47943) Add Operator CI Task for Java Build and Test
[ https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47943: -- Fix Version/s: kubernetes-operator-0.1.0 (was: 4.0.0) > Add Operator CI Task for Java Build and Test > > > Key: SPARK-47943 > URL: https://issues.apache.org/jira/browse/SPARK-47943 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-0.1.0 > > > We need to add CI task to build and test Java code for upcoming operator pull > requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48015) Update `build.gradle` to fix deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48015: - Assignee: Dongjoon Hyun > Update `build.gradle` to fix deprecation warnings > - > > Key: SPARK-48015 > URL: https://issues.apache.org/jira/browse/SPARK-48015 > Project: Spark > Issue Type: Sub-task > Components: Build, Kubernetes >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48015) Update `build.gradle` to fix deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48015. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9 [https://github.com/apache/spark-kubernetes-operator/pull/9] > Update `build.gradle` to fix deprecation warnings > - > > Key: SPARK-48015 > URL: https://issues.apache.org/jira/browse/SPARK-48015 > Project: Spark > Issue Type: Sub-task > Components: Build, Kubernetes >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47929) Setup Static Analysis for Operator
[ https://issues.apache.org/jira/browse/SPARK-47929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47929: -- Fix Version/s: kubernetes-operator-0.1.0 (was: 4.0.0) > Setup Static Analysis for Operator > -- > > Key: SPARK-47929 > URL: https://issues.apache.org/jira/browse/SPARK-47929 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-0.1.0 > > > Add common analysis tasks including checkstyle, spotbugs, jacoco. Also > include spotless for style fix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47950) Add Java API Module for Spark Operator
[ https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47950: -- Fix Version/s: kubernetes-operator-0.1.0 (was: 4.0.0) > Add Java API Module for Spark Operator > -- > > Key: SPARK-47950 > URL: https://issues.apache.org/jira/browse/SPARK-47950 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-0.1.0 > > > Spark Operator API refers to the > [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/] > __ that represents the spec for Spark Application in k8s. > This aims to add Java API library for Spark Operator, with the ability to > generate yaml spec. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48015) Update `build.gradle` to fix deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48015: -- Fix Version/s: kubernetes-operator-0.1.0 (was: 4.0.0) > Update `build.gradle` to fix deprecation warnings > - > > Key: SPARK-48015 > URL: https://issues.apache.org/jira/browse/SPARK-48015 > Project: Spark > Issue Type: Sub-task > Components: Build, Kubernetes >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Labels: pull-request-available > Fix For: kubernetes-operator-0.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48015) Update `build.gradle` to fix deprecation warnings
Dongjoon Hyun created SPARK-48015: - Summary: Update `build.gradle` to fix deprecation warnings Key: SPARK-48015 URL: https://issues.apache.org/jira/browse/SPARK-48015 Project: Spark Issue Type: Sub-task Components: Build, Kubernetes Affects Versions: kubernetes-operator-0.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47950) Add Java API Module for Spark Operator
[ https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47950: - Assignee: Zhou JIANG > Add Java API Module for Spark Operator > -- > > Key: SPARK-47950 > URL: https://issues.apache.org/jira/browse/SPARK-47950 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > > Spark Operator API refers to the > [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/] > __ that represents the spec for Spark Application in k8s. > This aims to add Java API library for Spark Operator, with the ability to > generate yaml spec. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47950) Add Java API Module for Spark Operator
[ https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47950. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 8 [https://github.com/apache/spark-kubernetes-operator/pull/8] > Add Java API Module for Spark Operator > -- > > Key: SPARK-47950 > URL: https://issues.apache.org/jira/browse/SPARK-47950 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Spark Operator API refers to the > [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/] > __ that represents the spec for Spark Application in k8s. > This aims to add Java API library for Spark Operator, with the ability to > generate yaml spec. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48010) Avoid repeated calls to conf.resolver in resolveExpression
[ https://issues.apache.org/jira/browse/SPARK-48010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48010. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46248 [https://github.com/apache/spark/pull/46248] > Avoid repeated calls to conf.resolver in resolveExpression > -- > > Key: SPARK-48010 > URL: https://issues.apache.org/jira/browse/SPARK-48010 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.3 >Reporter: Nikhil Sheoran >Assignee: Nikhil Sheoran >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Consider a view with a large number of columns (~1000s). When resolving this > view, looking at the flamegraph, observed repeated initializations of `conf` > to obtain the `resolver` for each column of the view. > This can be easily optimized to reuse the same resolver (obtained once) for > the various calls to `innerResolve` in `resolveExpression`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
[ https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46122: - Assignee: Dongjoon Hyun > Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default > - > > Key: SPARK-46122 > URL: https://issues.apache.org/jira/browse/SPARK-46122 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48005) Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup`
[ https://issues.apache.org/jira/browse/SPARK-48005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48005. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46242 [https://github.com/apache/spark/pull/46242] > Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup` > - > > Key: SPARK-48005 > URL: https://issues.apache.org/jira/browse/SPARK-48005 > Project: Spark > Issue Type: Sub-task > Components: Connect, PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48007) MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11
[ https://issues.apache.org/jira/browse/SPARK-48007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48007. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46244 [https://github.com/apache/spark/pull/46244] > MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11 > --- > > Key: SPARK-48007 > URL: https://issues.apache.org/jira/browse/SPARK-48007 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47991) Arrange the test cases for window frames and window functions.
[ https://issues.apache.org/jira/browse/SPARK-47991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47991. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46226 [https://github.com/apache/spark/pull/46226] > Arrange the test cases for window frames and window functions. > -- > > Key: SPARK-47991 > URL: https://issues.apache.org/jira/browse/SPARK-47991 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22231) Support of map, filter, withField, dropFields in nested list of structures
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840934#comment-17840934 ] Dongjoon Hyun commented on SPARK-22231: --- I removed the outdated target version from this issue. > Support of map, filter, withField, dropFields in nested list of structures > -- > > Key: SPARK-22231 > URL: https://issues.apache.org/jira/browse/SPARK-22231 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.2.0 >Reporter: DB Tsai >Priority: Major > > At Netflix's algorithm team, we work on ranking problems to find the great > content to fulfill the unique tastes of our members. Before building a > recommendation algorithms, we need to prepare the training, testing, and > validation datasets in Apache Spark. Due to the nature of ranking problems, > we have a nested list of items to be ranked in one column, and the top level > is the contexts describing the setting for where a model is to be used (e.g. > profiles, country, time, device, etc.) Here is a blog post describing the > details, [Distributed Time Travel for Feature > Generation|https://medium.com/netflix-techblog/distributed-time-travel-for-feature-generation-389cccdd3907]. > > To be more concrete, for the ranks of videos for a given profile_id at a > given country, our data schema can be looked like this, > {code:java} > root > |-- profile_id: long (nullable = true) > |-- country_iso_code: string (nullable = true) > |-- items: array (nullable = false) > ||-- element: struct (containsNull = false) > |||-- title_id: integer (nullable = true) > |||-- scores: double (nullable = true) > ... > {code} > We oftentimes need to work on the nested list of structs by applying some > functions on them. Sometimes, we're dropping or adding new columns in the > nested list of structs. Currently, there is no easy solution in open source > Apache Spark to perform those operations using SQL primitives; many people > just convert the data into RDD to work on the nested level of data, and then > reconstruct the new dataframe as workaround. This is extremely inefficient > because all the optimizations like predicate pushdown in SQL can not be > performed, we can not leverage on the columnar format, and the serialization > and deserialization cost becomes really huge even we just want to add a new > column in the nested level. > We built a solution internally at Netflix which we're very happy with. We > plan to make it open source in Spark upstream. We would like to socialize the > API design to see if we miss any use-case. > The first API we added is *mapItems* on dataframe which take a function from > *Column* to *Column*, and then apply the function on nested dataframe. Here > is an example, > {code:java} > case class Data(foo: Int, bar: Double, items: Seq[Double]) > val df: Dataset[Data] = spark.createDataset(Seq( > Data(10, 10.0, Seq(10.1, 10.2, 10.3, 10.4)), > Data(20, 20.0, Seq(20.1, 20.2, 20.3, 20.4)) > )) > val result = df.mapItems("items") { > item => item * 2.0 > } > result.printSchema() > // root > // |-- foo: integer (nullable = false) > // |-- bar: double (nullable = false) > // |-- items: array (nullable = true) > // ||-- element: double (containsNull = true) > result.show() > // +---+++ > // |foo| bar| items| > // +---+++ > // | 10|10.0|[20.2, 20.4, 20.6...| > // | 20|20.0|[40.2, 40.4, 40.6...| > // +---+++ > {code} > Now, with the ability of applying a function in the nested dataframe, we can > add a new function, *withColumn* in *Column* to add or replace the existing > column that has the same name in the nested list of struct. Here is two > examples demonstrating the API together with *mapItems*; the first one > replaces the existing column, > {code:java} > case class Item(a: Int, b: Double) > case class Data(foo: Int, bar: Double, items: Seq[Item]) > val df: Dataset[Data] = spark.createDataset(Seq( > Data(10, 10.0, Seq(Item(10, 10.0), Item(11, 11.0))), > Data(20, 20.0, Seq(Item(20, 20.0), Item(21, 21.0))) > )) > val result = df.mapItems("items") { > item => item.withColumn(item("b") + 1 as "b") > } > result.printSchema > root > // |-- foo: integer (nullable = false) > // |-- bar: double (nullable = false) > // |-- items: array (nullable = true) > // ||-- element: struct (containsNull = true) > // |||-- a: integer (nullable = true) > // |||-- b: double (nullable = true) > result.show(false) > // +---++--+ > // |foo|bar |items | > // +---++--+ > // |10 |10.0|[[10,11.0], [11,12.0]]| > // |20 |20.0|[[20,21.0], [21,22.0]]| > //
[jira] [Updated] (SPARK-22231) Support of map, filter, withField, dropFields in nested list of structures
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-22231: -- Target Version/s: (was: 3.2.0) > Support of map, filter, withField, dropFields in nested list of structures > -- > > Key: SPARK-22231 > URL: https://issues.apache.org/jira/browse/SPARK-22231 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.2.0 >Reporter: DB Tsai >Priority: Major > > At Netflix's algorithm team, we work on ranking problems to find the great > content to fulfill the unique tastes of our members. Before building a > recommendation algorithms, we need to prepare the training, testing, and > validation datasets in Apache Spark. Due to the nature of ranking problems, > we have a nested list of items to be ranked in one column, and the top level > is the contexts describing the setting for where a model is to be used (e.g. > profiles, country, time, device, etc.) Here is a blog post describing the > details, [Distributed Time Travel for Feature > Generation|https://medium.com/netflix-techblog/distributed-time-travel-for-feature-generation-389cccdd3907]. > > To be more concrete, for the ranks of videos for a given profile_id at a > given country, our data schema can be looked like this, > {code:java} > root > |-- profile_id: long (nullable = true) > |-- country_iso_code: string (nullable = true) > |-- items: array (nullable = false) > ||-- element: struct (containsNull = false) > |||-- title_id: integer (nullable = true) > |||-- scores: double (nullable = true) > ... > {code} > We oftentimes need to work on the nested list of structs by applying some > functions on them. Sometimes, we're dropping or adding new columns in the > nested list of structs. Currently, there is no easy solution in open source > Apache Spark to perform those operations using SQL primitives; many people > just convert the data into RDD to work on the nested level of data, and then > reconstruct the new dataframe as workaround. This is extremely inefficient > because all the optimizations like predicate pushdown in SQL can not be > performed, we can not leverage on the columnar format, and the serialization > and deserialization cost becomes really huge even we just want to add a new > column in the nested level. > We built a solution internally at Netflix which we're very happy with. We > plan to make it open source in Spark upstream. We would like to socialize the > API design to see if we miss any use-case. > The first API we added is *mapItems* on dataframe which take a function from > *Column* to *Column*, and then apply the function on nested dataframe. Here > is an example, > {code:java} > case class Data(foo: Int, bar: Double, items: Seq[Double]) > val df: Dataset[Data] = spark.createDataset(Seq( > Data(10, 10.0, Seq(10.1, 10.2, 10.3, 10.4)), > Data(20, 20.0, Seq(20.1, 20.2, 20.3, 20.4)) > )) > val result = df.mapItems("items") { > item => item * 2.0 > } > result.printSchema() > // root > // |-- foo: integer (nullable = false) > // |-- bar: double (nullable = false) > // |-- items: array (nullable = true) > // ||-- element: double (containsNull = true) > result.show() > // +---+++ > // |foo| bar| items| > // +---+++ > // | 10|10.0|[20.2, 20.4, 20.6...| > // | 20|20.0|[40.2, 40.4, 40.6...| > // +---+++ > {code} > Now, with the ability of applying a function in the nested dataframe, we can > add a new function, *withColumn* in *Column* to add or replace the existing > column that has the same name in the nested list of struct. Here is two > examples demonstrating the API together with *mapItems*; the first one > replaces the existing column, > {code:java} > case class Item(a: Int, b: Double) > case class Data(foo: Int, bar: Double, items: Seq[Item]) > val df: Dataset[Data] = spark.createDataset(Seq( > Data(10, 10.0, Seq(Item(10, 10.0), Item(11, 11.0))), > Data(20, 20.0, Seq(Item(20, 20.0), Item(21, 21.0))) > )) > val result = df.mapItems("items") { > item => item.withColumn(item("b") + 1 as "b") > } > result.printSchema > root > // |-- foo: integer (nullable = false) > // |-- bar: double (nullable = false) > // |-- items: array (nullable = true) > // ||-- element: struct (containsNull = true) > // |||-- a: integer (nullable = true) > // |||-- b: double (nullable = true) > result.show(false) > // +---++--+ > // |foo|bar |items | > // +---++--+ > // |10 |10.0|[[10,11.0], [11,12.0]]| > // |20 |20.0|[[20,21.0], [21,22.0]]| > // +---++--+ > {code} > and the second
[jira] [Updated] (SPARK-24941) Add RDDBarrier.coalesce() function
[ https://issues.apache.org/jira/browse/SPARK-24941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24941: -- Target Version/s: (was: 3.2.0) > Add RDDBarrier.coalesce() function > -- > > Key: SPARK-24941 > URL: https://issues.apache.org/jira/browse/SPARK-24941 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Xingbo Jiang >Priority: Major > > https://github.com/apache/spark/pull/21758#discussion_r204917245 > The number of partitions from the input data can be unexpectedly large, eg. > if you do > {code} > sc.textFile(...).barrier().mapPartitions() > {code} > The number of input partitions is based on the hdfs input splits. We shall > provide a way in RDDBarrier to enable users to specify the number of tasks in > a barrier stage. Maybe something like RDDBarrier.coalesce(numPartitions: Int) > . -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25383) Image data source supports sample pushdown
[ https://issues.apache.org/jira/browse/SPARK-25383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25383: -- Target Version/s: (was: 3.2.0) > Image data source supports sample pushdown > -- > > Key: SPARK-25383 > URL: https://issues.apache.org/jira/browse/SPARK-25383 > Project: Spark > Issue Type: New Feature > Components: ML, SQL >Affects Versions: 3.1.0 >Reporter: Xiangrui Meng >Priority: Major > > After SPARK-25349, we should update image data source to support sampling. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25752) Add trait to easily whitelist logical operators that produce named output from CleanupAliases
[ https://issues.apache.org/jira/browse/SPARK-25752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25752: -- Target Version/s: (was: 3.2.0) > Add trait to easily whitelist logical operators that produce named output > from CleanupAliases > - > > Key: SPARK-25752 > URL: https://issues.apache.org/jira/browse/SPARK-25752 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Minor > > The rule `CleanupAliases` cleans up aliases from logical operators that do > not match a whitelist. This whitelist is hardcoded inside the rule which is > cumbersome. This PR is to clean that up by making a trait `HasNamedOutput` > that will be ignored by `CleanupAliases` and other ops that require aliases > to be preserved in the operator should extend it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28629) Capture the missing rules in HiveSessionStateBuilder
[ https://issues.apache.org/jira/browse/SPARK-28629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840928#comment-17840928 ] Dongjoon Hyun commented on SPARK-28629: --- I removed the outdated target version from this issue. > Capture the missing rules in HiveSessionStateBuilder > > > Key: SPARK-28629 > URL: https://issues.apache.org/jira/browse/SPARK-28629 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Xiao Li >Priority: Major > > A general mistake for new contributors is to forget adding the corresponding > rules into the extended extendedResolutionRules, postHocResolutionRules, > extendedCheckRules in HiveSessionStateBuilder. We need to avoid missing the > rules or capture them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade
[ https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840930#comment-17840930 ] Dongjoon Hyun commented on SPARK-27780: --- I removed the outdated target version from this issue. > Shuffle server & client should be versioned to enable smoother upgrade > -- > > Key: SPARK-27780 > URL: https://issues.apache.org/jira/browse/SPARK-27780 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Imran Rashid >Priority: Major > > The external shuffle service is often upgraded at a different time than spark > itself. However, this causes problems when the protocol changes between the > shuffle service and the spark runtime -- this forces users to upgrade > everything simultaneously. > We should add versioning to the shuffle client & server, so they know what > messages the other will support. This would allow better handling of mixed > versions, from better error msgs to allowing some mismatched versions (with > reduced capabilities). > This originally came up in a discussion here: > https://github.com/apache/spark/pull/24565#issuecomment-493496466 > There are a few ways we could do the versioning which we still need to > discuss: > 1) Version specified by config. This allows for mixed versions across the > cluster and rolling upgrades. It also will let a spark 3.0 client talk to a > 2.4 shuffle service. But, may be a nuisance for users to get this right. > 2) Auto-detection during registration with local shuffle service. This makes > the versioning easy for the end user, and can even handle a 2.4 shuffle > service though it does not support the new versioning. However, it will not > handle a rolling upgrade correctly -- if the local shuffle service has been > upgraded, but other nodes in the cluster have not, it will get the version > wrong. > 3) Exchange versions per-connection. When a connection is opened, the server > & client could first exchange messages with their versions, so they know how > to continue communication after that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28629) Capture the missing rules in HiveSessionStateBuilder
[ https://issues.apache.org/jira/browse/SPARK-28629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28629: -- Target Version/s: (was: 3.2.0) > Capture the missing rules in HiveSessionStateBuilder > > > Key: SPARK-28629 > URL: https://issues.apache.org/jira/browse/SPARK-28629 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Xiao Li >Priority: Major > > A general mistake for new contributors is to forget adding the corresponding > rules into the extended extendedResolutionRules, postHocResolutionRules, > extendedCheckRules in HiveSessionStateBuilder. We need to avoid missing the > rules or capture them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade
[ https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27780: -- Target Version/s: (was: 3.2.0) > Shuffle server & client should be versioned to enable smoother upgrade > -- > > Key: SPARK-27780 > URL: https://issues.apache.org/jira/browse/SPARK-27780 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Imran Rashid >Priority: Major > > The external shuffle service is often upgraded at a different time than spark > itself. However, this causes problems when the protocol changes between the > shuffle service and the spark runtime -- this forces users to upgrade > everything simultaneously. > We should add versioning to the shuffle client & server, so they know what > messages the other will support. This would allow better handling of mixed > versions, from better error msgs to allowing some mismatched versions (with > reduced capabilities). > This originally came up in a discussion here: > https://github.com/apache/spark/pull/24565#issuecomment-493496466 > There are a few ways we could do the versioning which we still need to > discuss: > 1) Version specified by config. This allows for mixed versions across the > cluster and rolling upgrades. It also will let a spark 3.0 client talk to a > 2.4 shuffle service. But, may be a nuisance for users to get this right. > 2) Auto-detection during registration with local shuffle service. This makes > the versioning easy for the end user, and can even handle a 2.4 shuffle > service though it does not support the new versioning. However, it will not > handle a rolling upgrade correctly -- if the local shuffle service has been > upgraded, but other nodes in the cluster have not, it will get the version > wrong. > 3) Exchange versions per-connection. When a connection is opened, the server > & client could first exchange messages with their versions, so they know how > to continue communication after that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30324) Simplify API for JSON access in DataFrames/SQL
[ https://issues.apache.org/jira/browse/SPARK-30324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840927#comment-17840927 ] Dongjoon Hyun commented on SPARK-30324: --- I removed the outdated target version from this issue. > Simplify API for JSON access in DataFrames/SQL > -- > > Key: SPARK-30324 > URL: https://issues.apache.org/jira/browse/SPARK-30324 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.4 >Reporter: Burak Yavuz >Priority: Major > > get_json_object() is a UDF to parse JSON fields. It is verbose and hard to > use, e.g. I wasn't expecting the path to a field to have to start with "$.". > We can simplify all of this when a column is of StringType, and a nested > field is requested. This API sugar will in the query planner be rewritten as > get_json_object. > This nested access can then be extended in the future to other > semi-structured formats. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30324) Simplify API for JSON access in DataFrames/SQL
[ https://issues.apache.org/jira/browse/SPARK-30324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30324: -- Target Version/s: (was: 3.2.0) > Simplify API for JSON access in DataFrames/SQL > -- > > Key: SPARK-30324 > URL: https://issues.apache.org/jira/browse/SPARK-30324 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.4 >Reporter: Burak Yavuz >Priority: Major > > get_json_object() is a UDF to parse JSON fields. It is verbose and hard to > use, e.g. I wasn't expecting the path to a field to have to start with "$.". > We can simplify all of this when a column is of StringType, and a nested > field is requested. This API sugar will in the query planner be rewritten as > get_json_object. > This nested access can then be extended in the future to other > semi-structured formats. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30334) Add metadata around semi-structured columns to Spark
[ https://issues.apache.org/jira/browse/SPARK-30334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30334: -- Target Version/s: (was: 3.2.0) > Add metadata around semi-structured columns to Spark > > > Key: SPARK-30334 > URL: https://issues.apache.org/jira/browse/SPARK-30334 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.4 >Reporter: Burak Yavuz >Priority: Major > > Semi-structured data is used widely in the data industry for reporting events > in a wide variety of formats. Click events in product analytics can be stored > as json. Some application logs can be in the form of delimited key=value > text. Some data may be in xml. > The goal of this project is to be able to signal Spark that such a column > exists. This will then enable Spark to "auto-parse" these columns on the fly. > The proposal is to store this information as part of the column metadata, in > the fields: > - format: The format of the semi-structured column, e.g. json, xml, avro > - options: Options for parsing these columns > Then imagine having the following data: > {code:java} > ++---++ > | ts | event |raw | > ++---++ > | 2019-10-12 | click | {"field":"value"} | > ++---++ {code} > SELECT raw.field FROM data > will return "value" > or the following data > {code:java} > ++---+--+ > | ts | event | raw | > ++---+--+ > | 2019-10-12 | click | field1=v1|field2=v2 | > ++---+--+ {code} > SELECT raw.field1 FROM data > will return v1. > > As a first step, we will introduce the function "as_json", which accomplishes > this for JSON columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30334) Add metadata around semi-structured columns to Spark
[ https://issues.apache.org/jira/browse/SPARK-30334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840926#comment-17840926 ] Dongjoon Hyun commented on SPARK-30334: --- I removed the outdated target version from this issue. > Add metadata around semi-structured columns to Spark > > > Key: SPARK-30334 > URL: https://issues.apache.org/jira/browse/SPARK-30334 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.4 >Reporter: Burak Yavuz >Priority: Major > > Semi-structured data is used widely in the data industry for reporting events > in a wide variety of formats. Click events in product analytics can be stored > as json. Some application logs can be in the form of delimited key=value > text. Some data may be in xml. > The goal of this project is to be able to signal Spark that such a column > exists. This will then enable Spark to "auto-parse" these columns on the fly. > The proposal is to store this information as part of the column metadata, in > the fields: > - format: The format of the semi-structured column, e.g. json, xml, avro > - options: Options for parsing these columns > Then imagine having the following data: > {code:java} > ++---++ > | ts | event |raw | > ++---++ > | 2019-10-12 | click | {"field":"value"} | > ++---++ {code} > SELECT raw.field FROM data > will return "value" > or the following data > {code:java} > ++---+--+ > | ts | event | raw | > ++---+--+ > | 2019-10-12 | click | field1=v1|field2=v2 | > ++---+--+ {code} > SELECT raw.field1 FROM data > will return v1. > > As a first step, we will introduce the function "as_json", which accomplishes > this for JSON columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24942) Improve cluster resource management with jobs containing barrier stage
[ https://issues.apache.org/jira/browse/SPARK-24942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840913#comment-17840913 ] Dongjoon Hyun commented on SPARK-24942: --- I removed the outdated target version, `3.2.0`, from this Jira. For now, Apache Spark community has no target version for this issue. > Improve cluster resource management with jobs containing barrier stage > -- > > Key: SPARK-24942 > URL: https://issues.apache.org/jira/browse/SPARK-24942 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Xingbo Jiang >Priority: Major > > https://github.com/apache/spark/pull/21758#discussion_r205652317 > We shall improve cluster resource management to address the following issues: > - With dynamic resource allocation enabled, it may happen that we acquire > some executors (but not enough to launch all the tasks in a barrier stage) > and later release them due to executor idle time expire, and then acquire > again. > - There can be deadlock with two concurrent applications. Each application > may acquire some resources, but not enough to launch all the tasks in a > barrier stage. And after hitting the idle timeout and releasing them, they > may acquire resources again, but just continually trade resources between > each other. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24942) Improve cluster resource management with jobs containing barrier stage
[ https://issues.apache.org/jira/browse/SPARK-24942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24942: -- Target Version/s: (was: 3.2.0) > Improve cluster resource management with jobs containing barrier stage > -- > > Key: SPARK-24942 > URL: https://issues.apache.org/jira/browse/SPARK-24942 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Xingbo Jiang >Priority: Major > > https://github.com/apache/spark/pull/21758#discussion_r205652317 > We shall improve cluster resource management to address the following issues: > - With dynamic resource allocation enabled, it may happen that we acquire > some executors (but not enough to launch all the tasks in a barrier stage) > and later release them due to executor idle time expire, and then acquire > again. > - There can be deadlock with two concurrent applications. Each application > may acquire some resources, but not enough to launch all the tasks in a > barrier stage. And after hitting the idle timeout and releasing them, they > may acquire resources again, but just continually trade resources between > each other. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44111) Prepare Apache Spark 4.0.0
[ https://issues.apache.org/jira/browse/SPARK-44111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840853#comment-17840853 ] Dongjoon Hyun commented on SPARK-44111: --- Yes, we will provide `4.0.0-preview` in advance, [~fbiville] . Here is the discussion thread on Apache Spark dev mailing list. * [https://lists.apache.org/thread/nxmvz2j7kp96otzlnl3kd277knlb6qgb] [~cloud_fan] is the release manager who is leading Apache Spark 4.0.0 release (including preview). > Prepare Apache Spark 4.0.0 > -- > > Key: SPARK-44111 > URL: https://issues.apache.org/jira/browse/SPARK-44111 > Project: Spark > Issue Type: Umbrella > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Critical > Labels: pull-request-available > > For now, this issue aims to collect ideas for planning Apache Spark 4.0.0. > We will add more items which will be excluded from Apache Spark 3.5.0 > (Feature Freeze: July 16th, 2023). > {code} > Spark 1: 2014.05 (1.0.0) ~ 2016.11 (1.6.3) > Spark 2: 2016.07 (2.0.0) ~ 2021.05 (2.4.8) > Spark 3: 2020.06 (3.0.0) ~ 2026.xx (3.5.x) > Spark 4: 2024.06 (4.0.0, NEW) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47987) Enable `ArrowParityTests.test_createDataFrame_empty_partition`
[ https://issues.apache.org/jira/browse/SPARK-47987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47987. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46220 [https://github.com/apache/spark/pull/46220] > Enable `ArrowParityTests.test_createDataFrame_empty_partition` > -- > > Key: SPARK-47987 > URL: https://issues.apache.org/jira/browse/SPARK-47987 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47990) Upgrade `zstd-jni` to 1.5.6-3
[ https://issues.apache.org/jira/browse/SPARK-47990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47990. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46225 [https://github.com/apache/spark/pull/46225] > Upgrade `zstd-jni` to 1.5.6-3 > - > > Key: SPARK-47990 > URL: https://issues.apache.org/jira/browse/SPARK-47990 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
[ https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840644#comment-17840644 ] Dongjoon Hyun commented on SPARK-46122: --- I sent the discussion thread for this issue. - [https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd] > Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default > - > > Key: SPARK-46122 > URL: https://issues.apache.org/jira/browse/SPARK-46122 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
[ https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46122: -- Summary: Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default (was: Set `spark.sql.legacy.createHiveTableByDefault` to false by default) > Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default > - > > Key: SPARK-46122 > URL: https://issues.apache.org/jira/browse/SPARK-46122 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to false by default
[ https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46122: -- Summary: Set `spark.sql.legacy.createHiveTableByDefault` to false by default (was: Disable spark.sql.legacy.createHiveTableByDefault by default) > Set `spark.sql.legacy.createHiveTableByDefault` to false by default > --- > > Key: SPARK-46122 > URL: https://issues.apache.org/jira/browse/SPARK-46122 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47979) Use Hive tables explicitly for Hive table capability tests
[ https://issues.apache.org/jira/browse/SPARK-47979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47979. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46211 [https://github.com/apache/spark/pull/46211] > Use Hive tables explicitly for Hive table capability tests > -- > > Key: SPARK-47979 > URL: https://issues.apache.org/jira/browse/SPARK-47979 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47979) Use Hive table explicitly for Hive table capability tests
Dongjoon Hyun created SPARK-47979: - Summary: Use Hive table explicitly for Hive table capability tests Key: SPARK-47979 URL: https://issues.apache.org/jira/browse/SPARK-47979 Project: Spark Issue Type: Test Components: SQL, Tests Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47979) Use Hive tables explicitly for Hive table capability tests
[ https://issues.apache.org/jira/browse/SPARK-47979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47979: -- Summary: Use Hive tables explicitly for Hive table capability tests (was: Use Hive table explicitly for Hive table capability tests) > Use Hive tables explicitly for Hive table capability tests > -- > > Key: SPARK-47979 > URL: https://issues.apache.org/jira/browse/SPARK-47979 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45265) Support Hive 4.0 metastore
[ https://issues.apache.org/jira/browse/SPARK-45265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45265: - Assignee: (was: Attila Zsolt Piros) > Support Hive 4.0 metastore > -- > > Key: SPARK-45265 > URL: https://issues.apache.org/jira/browse/SPARK-45265 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Attila Zsolt Piros >Priority: Major > Labels: pull-request-available > > Although Hive 4.0.0 is still beta I would like to work on this as Hive 4.0.0 > will support support the pushdowns of partition column filters with > VARCHAR/CHAR types. > For details please see HIVE-26661: Support partition filter for char and > varchar types on Hive metastore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44677) Drop legacy Hive-based ORC file format
[ https://issues.apache.org/jira/browse/SPARK-44677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44677: -- Parent: (was: SPARK-44111) Issue Type: Task (was: Sub-task) > Drop legacy Hive-based ORC file format > -- > > Key: SPARK-44677 > URL: https://issues.apache.org/jira/browse/SPARK-44677 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > > Currently, Spark allows to use spark.sql.orc.impl=native/hive to switch the > ORC FileFormat implementation. > SPARK-23456(2.4) switched the default value of spark.sql.orc.impl from "hive" > to "native". and prepared to drop the "hive" implementation in the future. > > ... eventually, Apache Spark will drop old Hive-based ORC code. > The native implementation works well during the whole Spark 3.x period, so > it's a good time to consider dropping the "hive" one in Spark 4.0. > Also, we should take care about the backward-compatibility during change. > > BTW, IIRC, there was a different at Hive ORC CHAR implementation before. > > So, we couldn't remove it for backward-compatibility issues. Since Spark > > implements many CHAR features, we need to re-verify that {{native}} > > implementation has all legacy Hive-based ORC features -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47499) Reuse `test_help_command` in Connect
[ https://issues.apache.org/jira/browse/SPARK-47499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840514#comment-17840514 ] Dongjoon Hyun commented on SPARK-47499: --- Thank you for collecting this to the umbrella Jira, [~podongfeng] . > Reuse `test_help_command` in Connect > > > Key: SPARK-47499 > URL: https://issues.apache.org/jira/browse/SPARK-47499 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47633) Cache miss for queries using JOIN LATERAL with join condition
[ https://issues.apache.org/jira/browse/SPARK-47633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47633: - Assignee: Bruce Robbins > Cache miss for queries using JOIN LATERAL with join condition > - > > Key: SPARK-47633 > URL: https://issues.apache.org/jira/browse/SPARK-47633 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.2, 4.0.0, 3.5.1 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: pull-request-available > > For example: > {noformat} > CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2); > CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2); > create or replace temp view v1 as > select * > from t1 > join lateral ( > select c1 as a, c2 as b > from t2) > on c1 = a; > cache table v1; > explain select * from v1; > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- BroadcastHashJoin [c1#180], [a#173], Inner, BuildRight, false >:- LocalTableScan [c1#180, c2#181] >+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, > false] as bigint)),false), [plan_id=113] > +- LocalTableScan [a#173, b#174] > {noformat} > Note that there is no {{InMemoryRelation}}. > However, if you move the join condition into the subquery, the cached plan is > used: > {noformat} > CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2); > CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2); > create or replace temp view v2 as > select * > from t1 > join lateral ( > select c1 as a, c2 as b > from t2 > where t1.c1 = t2.c1); > cache table v2; > explain select * from v2; > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- Scan In-memory table v2 [c1#176, c2#177, a#178, b#179] > +- InMemoryRelation [c1#176, c2#177, a#178, b#179], StorageLevel(disk, > memory, deserialized, 1 replicas) > +- AdaptiveSparkPlan isFinalPlan=true >+- == Final Plan == > *(1) Project [c1#26, c2#27, a#19, b#20] > +- *(1) BroadcastHashJoin [c1#26], [c1#30], Inner, > BuildLeft, false > :- BroadcastQueryStage 0 > : +- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, false] as > bigint)),false), [plan_id=37] > : +- LocalTableScan [c1#26, c2#27] > +- *(1) LocalTableScan [a#19, b#20, c1#30] >+- == Initial Plan == > Project [c1#26, c2#27, a#19, b#20] > +- BroadcastHashJoin [c1#26], [c1#30], Inner, BuildLeft, > false > :- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, false] as > bigint)),false), [plan_id=37] > : +- LocalTableScan [c1#26, c2#27] > +- LocalTableScan [a#19, b#20, c1#30] > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47633) Cache miss for queries using JOIN LATERAL with join condition
[ https://issues.apache.org/jira/browse/SPARK-47633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47633. --- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46190 [https://github.com/apache/spark/pull/46190] > Cache miss for queries using JOIN LATERAL with join condition > - > > Key: SPARK-47633 > URL: https://issues.apache.org/jira/browse/SPARK-47633 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.2, 4.0.0, 3.5.1 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2 > > > For example: > {noformat} > CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2); > CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2); > create or replace temp view v1 as > select * > from t1 > join lateral ( > select c1 as a, c2 as b > from t2) > on c1 = a; > cache table v1; > explain select * from v1; > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- BroadcastHashJoin [c1#180], [a#173], Inner, BuildRight, false >:- LocalTableScan [c1#180, c2#181] >+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, > false] as bigint)),false), [plan_id=113] > +- LocalTableScan [a#173, b#174] > {noformat} > Note that there is no {{InMemoryRelation}}. > However, if you move the join condition into the subquery, the cached plan is > used: > {noformat} > CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2); > CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2); > create or replace temp view v2 as > select * > from t1 > join lateral ( > select c1 as a, c2 as b > from t2 > where t1.c1 = t2.c1); > cache table v2; > explain select * from v2; > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- Scan In-memory table v2 [c1#176, c2#177, a#178, b#179] > +- InMemoryRelation [c1#176, c2#177, a#178, b#179], StorageLevel(disk, > memory, deserialized, 1 replicas) > +- AdaptiveSparkPlan isFinalPlan=true >+- == Final Plan == > *(1) Project [c1#26, c2#27, a#19, b#20] > +- *(1) BroadcastHashJoin [c1#26], [c1#30], Inner, > BuildLeft, false > :- BroadcastQueryStage 0 > : +- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, false] as > bigint)),false), [plan_id=37] > : +- LocalTableScan [c1#26, c2#27] > +- *(1) LocalTableScan [a#19, b#20, c1#30] >+- == Initial Plan == > Project [c1#26, c2#27, a#19, b#20] > +- BroadcastHashJoin [c1#26], [c1#30], Inner, BuildLeft, > false > :- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, false] as > bigint)),false), [plan_id=37] > : +- LocalTableScan [c1#26, c2#27] > +- LocalTableScan [a#19, b#20, c1#30] > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47974) Remove install_scala from build/mvn
[ https://issues.apache.org/jira/browse/SPARK-47974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47974: -- Parent: SPARK-44111 Issue Type: Sub-task (was: Improvement) > Remove install_scala from build/mvn > --- > > Key: SPARK-47974 > URL: https://issues.apache.org/jira/browse/SPARK-47974 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47974) Remove install_scala from build/mvn
[ https://issues.apache.org/jira/browse/SPARK-47974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47974. --- Fix Version/s: 4.0.0 Assignee: Cheng Pan Resolution: Fixed This is resolved via [https://github.com/apache/spark/pull/46204] > Remove install_scala from build/mvn > --- > > Key: SPARK-47974 > URL: https://issues.apache.org/jira/browse/SPARK-47974 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47969) Make `test_creation_index` deterministic
[ https://issues.apache.org/jira/browse/SPARK-47969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47969. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46200 [https://github.com/apache/spark/pull/46200] > Make `test_creation_index` deterministic > > > Key: SPARK-47969 > URL: https://issues.apache.org/jira/browse/SPARK-47969 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47956) sanity check for unresolved LCA reference
[ https://issues.apache.org/jira/browse/SPARK-47956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47956. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46185 [https://github.com/apache/spark/pull/46185] > sanity check for unresolved LCA reference > - > > Key: SPARK-47956 > URL: https://issues.apache.org/jira/browse/SPARK-47956 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47956) sanity check for unresolved LCA reference
[ https://issues.apache.org/jira/browse/SPARK-47956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47956: - Assignee: Wenchen Fan > sanity check for unresolved LCA reference > - > > Key: SPARK-47956 > URL: https://issues.apache.org/jira/browse/SPARK-47956 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47948) Upgrade the minimum Pandas version to 2.0.0
[ https://issues.apache.org/jira/browse/SPARK-47948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47948: - Assignee: Haejoon Lee > Upgrade the minimum Pandas version to 2.0.0 > --- > > Key: SPARK-47948 > URL: https://issues.apache.org/jira/browse/SPARK-47948 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Bump up the minimum version of Pandas from 1.4.4 to 2.0.0 to support Pandas > API on Spark from Apache Spark 4.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47948) Upgrade the minimum Pandas version to 2.0.0
[ https://issues.apache.org/jira/browse/SPARK-47948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47948. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46175 [https://github.com/apache/spark/pull/46175] > Upgrade the minimum Pandas version to 2.0.0 > --- > > Key: SPARK-47948 > URL: https://issues.apache.org/jira/browse/SPARK-47948 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Bump up the minimum version of Pandas from 1.4.4 to 2.0.0 to support Pandas > API on Spark from Apache Spark 4.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47948) Upgrade the minimum Pandas version to 2.0.0
[ https://issues.apache.org/jira/browse/SPARK-47948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47948: -- Summary: Upgrade the minimum Pandas version to 2.0.0 (was: Bump Pandas to 2.0.0) > Upgrade the minimum Pandas version to 2.0.0 > --- > > Key: SPARK-47948 > URL: https://issues.apache.org/jira/browse/SPARK-47948 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Bump up the minimum version of Pandas from 1.4.4 to 2.0.0 to support Pandas > API on Spark from Apache Spark 4.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47949) MsSQLServer: Bump up docker image version to2022-CU12-GDR1-ubuntu-22.04
[ https://issues.apache.org/jira/browse/SPARK-47949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47949. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46176 [https://github.com/apache/spark/pull/46176] > MsSQLServer: Bump up docker image version to2022-CU12-GDR1-ubuntu-22.04 > --- > > Key: SPARK-47949 > URL: https://issues.apache.org/jira/browse/SPARK-47949 > Project: Spark > Issue Type: Sub-task > Components: Spark Docker >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > https://mcr.microsoft.com/en-us/product/mssql/server/tags -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47949) MsSQLServer: Bump up docker image version to2022-CU12-GDR1-ubuntu-22.04
[ https://issues.apache.org/jira/browse/SPARK-47949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47949: - Assignee: Kent Yao > MsSQLServer: Bump up docker image version to2022-CU12-GDR1-ubuntu-22.04 > --- > > Key: SPARK-47949 > URL: https://issues.apache.org/jira/browse/SPARK-47949 > Project: Spark > Issue Type: Sub-task > Components: Spark Docker >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > > https://mcr.microsoft.com/en-us/product/mssql/server/tags -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47953) MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server
[ https://issues.apache.org/jira/browse/SPARK-47953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47953: - Assignee: Kent Yao > MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server > -- > > Key: SPARK-47953 > URL: https://issues.apache.org/jira/browse/SPARK-47953 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47953) MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server
[ https://issues.apache.org/jira/browse/SPARK-47953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47953. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46177 [https://github.com/apache/spark/pull/46177] > MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server > -- > > Key: SPARK-47953 > URL: https://issues.apache.org/jira/browse/SPARK-47953 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47943) Add Operator CI Task for Java Build and Test
[ https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47943: - Assignee: Zhou JIANG > Add Operator CI Task for Java Build and Test > > > Key: SPARK-47943 > URL: https://issues.apache.org/jira/browse/SPARK-47943 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > > We need to add CI task to build and test Java code for upcoming operator pull > requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47943) Add Operator CI Task for Java Build and Test
[ https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47943. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 7 [https://github.com/apache/spark-kubernetes-operator/pull/7] > Add Operator CI Task for Java Build and Test > > > Key: SPARK-47943 > URL: https://issues.apache.org/jira/browse/SPARK-47943 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We need to add CI task to build and test Java code for upcoming operator pull > requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47929) Setup Static Analysis for Operator
[ https://issues.apache.org/jira/browse/SPARK-47929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47929. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 6 [https://github.com/apache/spark-kubernetes-operator/pull/6] > Setup Static Analysis for Operator > -- > > Key: SPARK-47929 > URL: https://issues.apache.org/jira/browse/SPARK-47929 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add common analysis tasks including checkstyle, spotbugs, jacoco. Also > include spotless for style fix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47938) MsSQLServer: Cannot find data type BYTE error
[ https://issues.apache.org/jira/browse/SPARK-47938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47938. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46164 [https://github.com/apache/spark/pull/46164] > MsSQLServer: Cannot find data type BYTE error > - > > Key: SPARK-47938 > URL: https://issues.apache.org/jira/browse/SPARK-47938 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47937) Fix docstring of `hll_sketch_agg`
[ https://issues.apache.org/jira/browse/SPARK-47937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47937. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46163 [https://github.com/apache/spark/pull/46163] > Fix docstring of `hll_sketch_agg` > - > > Key: SPARK-47937 > URL: https://issues.apache.org/jira/browse/SPARK-47937 > Project: Spark > Issue Type: Improvement > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47937) Fix docstring of `hll_sketch_agg`
[ https://issues.apache.org/jira/browse/SPARK-47937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47937: - Assignee: Ruifeng Zheng > Fix docstring of `hll_sketch_agg` > - > > Key: SPARK-47937 > URL: https://issues.apache.org/jira/browse/SPARK-47937 > Project: Spark > Issue Type: Improvement > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47904) Preserve case in Avro schema when using enableStableIdentifiersForUnionType
[ https://issues.apache.org/jira/browse/SPARK-47904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47904. --- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46169 [https://github.com/apache/spark/pull/46169] > Preserve case in Avro schema when using enableStableIdentifiersForUnionType > --- > > Key: SPARK-47904 > URL: https://issues.apache.org/jira/browse/SPARK-47904 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2 > > > When enableStableIdentifiersForUnionType is enabled, all of the types are > lowercased which creates a problem when field types are case-sensitive: > {code:java} > Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava), > Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new > Schema.Field("F", Schema.create(Type.FLOAT))).asJava){code} > would become > {code:java} > struct> {code} > but instead should be > {code:java} > struct> {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47904) Preserve case in Avro schema when using enableStableIdentifiersForUnionType
[ https://issues.apache.org/jira/browse/SPARK-47904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47904: -- Fix Version/s: 4.0.0 > Preserve case in Avro schema when using enableStableIdentifiersForUnionType > --- > > Key: SPARK-47904 > URL: https://issues.apache.org/jira/browse/SPARK-47904 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > When enableStableIdentifiersForUnionType is enabled, all of the types are > lowercased which creates a problem when field types are case-sensitive: > {code:java} > Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava), > Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new > Schema.Field("F", Schema.create(Type.FLOAT))).asJava){code} > would become > {code:java} > struct> {code} > but instead should be > {code:java} > struct> {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47904) Preserve case in Avro schema when using enableStableIdentifiersForUnionType
[ https://issues.apache.org/jira/browse/SPARK-47904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47904: - Assignee: Ivan Sadikov > Preserve case in Avro schema when using enableStableIdentifiersForUnionType > --- > > Key: SPARK-47904 > URL: https://issues.apache.org/jira/browse/SPARK-47904 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Labels: pull-request-available > > When enableStableIdentifiersForUnionType is enabled, all of the types are > lowercased which creates a problem when field types are case-sensitive: > {code:java} > Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava), > Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new > Schema.Field("F", Schema.create(Type.FLOAT))).asJava){code} > would become > {code:java} > struct> {code} > but instead should be > {code:java} > struct> {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47942) Drop K8s v1.26 Support
[ https://issues.apache.org/jira/browse/SPARK-47942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47942. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46168 [https://github.com/apache/spark/pull/46168] > Drop K8s v1.26 Support > -- > > Key: SPARK-47942 > URL: https://issues.apache.org/jira/browse/SPARK-47942 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47942) Drop K8s v1.26 Support
[ https://issues.apache.org/jira/browse/SPARK-47942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47942: - Assignee: Dongjoon Hyun > Drop K8s v1.26 Support > -- > > Key: SPARK-47942 > URL: https://issues.apache.org/jira/browse/SPARK-47942 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47942) Drop K8s v1.26 Support
Dongjoon Hyun created SPARK-47942: - Summary: Drop K8s v1.26 Support Key: SPARK-47942 URL: https://issues.apache.org/jira/browse/SPARK-47942 Project: Spark Issue Type: Sub-task Components: Kubernetes Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47940) Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
[ https://issues.apache.org/jira/browse/SPARK-47940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47940: -- Reporter: Cheng Pan (was: Dongjoon Hyun) > Upgrade `guava` dependency to `33.1.0-jre` in Docker IT > --- > > Key: SPARK-47940 > URL: https://issues.apache.org/jira/browse/SPARK-47940 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47940) Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
[ https://issues.apache.org/jira/browse/SPARK-47940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47940. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46167 [https://github.com/apache/spark/pull/46167] > Upgrade `guava` dependency to `33.1.0-jre` in Docker IT > --- > > Key: SPARK-47940 > URL: https://issues.apache.org/jira/browse/SPARK-47940 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Cheng Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47940) Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
Dongjoon Hyun created SPARK-47940: - Summary: Upgrade `guava` dependency to `33.1.0-jre` in Docker IT Key: SPARK-47940 URL: https://issues.apache.org/jira/browse/SPARK-47940 Project: Spark Issue Type: Sub-task Components: Build, Tests Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47935) Pin pandas==2.0.3 for pypy3.8
[ https://issues.apache.org/jira/browse/SPARK-47935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47935: - Assignee: Ruifeng Zheng > Pin pandas==2.0.3 for pypy3.8 > - > > Key: SPARK-47935 > URL: https://issues.apache.org/jira/browse/SPARK-47935 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47935) Pin pandas==2.0.3 for pypy3.8
[ https://issues.apache.org/jira/browse/SPARK-47935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47935. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46159 [https://github.com/apache/spark/pull/46159] > Pin pandas==2.0.3 for pypy3.8 > - > > Key: SPARK-47935 > URL: https://issues.apache.org/jira/browse/SPARK-47935 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47930) Upgrade RoaringBitmap to 1.0.6
[ https://issues.apache.org/jira/browse/SPARK-47930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47930. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46152 [https://github.com/apache/spark/pull/46152] > Upgrade RoaringBitmap to 1.0.6 > -- > > Key: SPARK-47930 > URL: https://issues.apache.org/jira/browse/SPARK-47930 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47925) Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest`
[ https://issues.apache.org/jira/browse/SPARK-47925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47925. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46145 [https://github.com/apache/spark/pull/46145] > Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` > -- > > Key: SPARK-47925 > URL: https://issues.apache.org/jira/browse/SPARK-47925 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47925) Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest`
[ https://issues.apache.org/jira/browse/SPARK-47925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47925: - Assignee: Dongjoon Hyun > Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` > -- > > Key: SPARK-47925 > URL: https://issues.apache.org/jira/browse/SPARK-47925 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47924) Add a debug log to `DiskStore.moveFileToBlock`
[ https://issues.apache.org/jira/browse/SPARK-47924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47924. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46144 [https://github.com/apache/spark/pull/46144] > Add a debug log to `DiskStore.moveFileToBlock` > -- > > Key: SPARK-47924 > URL: https://issues.apache.org/jira/browse/SPARK-47924 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47924) Add a debug log to `DiskStore.moveFileToBlock`
[ https://issues.apache.org/jira/browse/SPARK-47924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47924: - Assignee: Dongjoon Hyun > Add a debug log to `DiskStore.moveFileToBlock` > -- > > Key: SPARK-47924 > URL: https://issues.apache.org/jira/browse/SPARK-47924 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47923) Upgrade the minimum version of `arrow` R package to 10.0.0
[ https://issues.apache.org/jira/browse/SPARK-47923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47923. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46142 [https://github.com/apache/spark/pull/46142] > Upgrade the minimum version of `arrow` R package to 10.0.0 > -- > > Key: SPARK-47923 > URL: https://issues.apache.org/jira/browse/SPARK-47923 > Project: Spark > Issue Type: Sub-task > Components: R >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47925) Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest`
Dongjoon Hyun created SPARK-47925: - Summary: Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` Key: SPARK-47925 URL: https://issues.apache.org/jira/browse/SPARK-47925 Project: Spark Issue Type: Test Components: SQL, Tests Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org