date:20220925

[GitHub] [spark] khalidmammadov opened a new pull request, #37988: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (FINAL)

2022-09-25 Thread GitBox

khalidmammadov opened a new pull request, #37988: URL: https://github.com/apache/spark/pull/37988 ### What changes were proposed in this pull request? It's part of the Pyspark docstrings improvement series (https://github.com/apache/spark/pull/37592, https://github.com/apache/spark/pull/

[GitHub] [spark] AmplabJenkins commented on pull request #37988: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (FINAL)

2022-09-25 Thread GitBox

AmplabJenkins commented on PR #37988: URL: https://github.com/apache/spark/pull/37988#issuecomment-1257145678 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] khalidmammadov commented on pull request #37988: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (FINAL)

2022-09-25 Thread GitBox

khalidmammadov commented on PR #37988: URL: https://github.com/apache/spark/pull/37988#issuecomment-1257156072 @srowen @itholic @HyukjinKwon Please review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] HyukjinKwon opened a new pull request, #37989: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Explicitly check the element and length

2022-09-25 Thread GitBox

HyukjinKwon opened a new pull request, #37989: URL: https://github.com/apache/spark/pull/37989 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/37533 that works around the test failure by explicitly checking the element

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37989: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Explicitly check the element and length

2022-09-25 Thread GitBox

HyukjinKwon commented on code in PR #37989: URL: https://github.com/apache/spark/pull/37989#discussion_r979395539 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4469,7 +4469,7 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSpar

[GitHub] [spark] HyukjinKwon commented on pull request #37989: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Explicitly check the element and length

2022-09-25 Thread GitBox

HyukjinKwon commented on PR #37989: URL: https://github.com/apache/spark/pull/37989#issuecomment-1257175536 cc @mridulm FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on pull request #37988: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (FINAL)

2022-09-25 Thread GitBox

HyukjinKwon commented on PR #37988: URL: https://github.com/apache/spark/pull/37988#issuecomment-1257175625 Thanks for finishing this work, @khalidmammadov. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [spark] cchantep commented on pull request #36030: Draft: [SPARK-38715] Configurable client ID for Kafka Spark SQL producer

2022-09-25 Thread GitBox

cchantep commented on PR #36030: URL: https://github.com/apache/spark/pull/36030#issuecomment-1257182398 Closing because nobody review it timely? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [spark] roczei commented on pull request #37679: [SPARK-35242][SQL] Support changing session catalog's default database

2022-09-25 Thread GitBox

roczei commented on PR #37679: URL: https://github.com/apache/spark/pull/37679#issuecomment-1257184837 Hi @cloud-fan, All build issues have been fixed and all of your feedbacks have been implemented. Latest state: ``` $ bin/spark-shell --conf spark.sql.catalog.spark_catalog

[GitHub] [spark] EvgenyZamyatin commented on pull request #37967: Scalable SkipGram-Word2Vec implementation

2022-09-25 Thread GitBox

EvgenyZamyatin commented on PR #37967: URL: https://github.com/apache/spark/pull/37967#issuecomment-1257192550 @zhengruifeng Hi! Could you please review my changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37989: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Explicitly check the element and length

2022-09-25 Thread GitBox

HyukjinKwon commented on code in PR #37989: URL: https://github.com/apache/spark/pull/37989#discussion_r979395539 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4469,7 +4469,7 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSpar

[GitHub] [spark] HyukjinKwon commented on pull request #37985: [SPARK-40548][BUILD] Upgrade rocksdbjni from 7.5.3 to 7.6.0

2022-09-25 Thread GitBox

HyukjinKwon commented on PR #37985: URL: https://github.com/apache/spark/pull/37985#issuecomment-1257193303 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #37985: [SPARK-40548][BUILD] Upgrade rocksdbjni from 7.5.3 to 7.6.0

2022-09-25 Thread GitBox

HyukjinKwon closed pull request #37985: [SPARK-40548][BUILD] Upgrade rocksdbjni from 7.5.3 to 7.6.0 URL: https://github.com/apache/spark/pull/37985 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] khalidmammadov commented on pull request #37988: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (FINAL)

2022-09-25 Thread GitBox

khalidmammadov commented on PR #37988: URL: https://github.com/apache/spark/pull/37988#issuecomment-1257199499 > Thanks for finishing this work, @khalidmammadov. Happy to help -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [spark] lvshaokang commented on pull request #37986: [SPARK-40357][SQL] Migrate window type check failures onto error classes

2022-09-25 Thread GitBox

lvshaokang commented on PR #37986: URL: https://github.com/apache/spark/pull/37986#issuecomment-1257206608 @MaxGekk Thanks for you review. I'm already addressing it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] ivoson commented on pull request #37268: [SPARK-39853][CORE] Support stage level task resource profile for standalone cluster when dynamic allocation disabled

2022-09-25 Thread GitBox

ivoson commented on PR #37268: URL: https://github.com/apache/spark/pull/37268#issuecomment-1257212450 > Mostly looks good. We need to update the docs like: https://github.com/apache/spark/blob/master/docs/configuration.md#stage-level-scheduling-overview It says "the current implementation

[GitHub] [spark] attilapiros opened a new pull request, #37990: [WIP][SPARK-40458] Bump Kubernetes Client Version to 6.1.1

2022-09-25 Thread GitBox

attilapiros opened a new pull request, #37990: URL: https://github.com/apache/spark/pull/37990 ### What changes were proposed in this pull request? Bump kubernetes-client version from 5.12.3 to 6.1.1 and clean up all the deprecations. ### Why are the changes needed?

[GitHub] [spark] attilapiros commented on pull request #37990: [WIP][SPARK-40458] Bump Kubernetes Client Version to 6.1.1

2022-09-25 Thread GitBox

attilapiros commented on PR #37990: URL: https://github.com/apache/spark/pull/37990#issuecomment-1257240658 The `inNamespace` calls are added because of the [namespace changes](https://github.com/fabric8io/kubernetes-client/blob/master/doc/MIGRATION-v6.md#namespace-changes) and to have more

[GitHub] [spark] attilapiros commented on a diff in pull request #37990: [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

2022-09-25 Thread GitBox

attilapiros commented on code in PR #37990: URL: https://github.com/apache/spark/pull/37990#discussion_r979435775 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala: ## @@ -115,7 +115,10 @@ private[spark] object Spa

[GitHub] [spark] attilapiros commented on a diff in pull request #37990: [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

2022-09-25 Thread GitBox

attilapiros commented on code in PR #37990: URL: https://github.com/apache/spark/pull/37990#discussion_r979435892 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/K8sSubmitOps.scala: ## @@ -144,14 +134,13 @@ private[spark] class K8SSparkSubm

[GitHub] [spark] attilapiros commented on a diff in pull request #37990: [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

2022-09-25 Thread GitBox

attilapiros commented on code in PR #37990: URL: https://github.com/apache/spark/pull/37990#discussion_r979436243 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala: ## @@ -168,23 +171,19 @@ private[spark

[GitHub] [spark] bjornjorgensen opened a new pull request, #37991: [SPARK-40552][] Upgrade `protobuf-python` to 4.21.6

2022-09-25 Thread GitBox

bjornjorgensen opened a new pull request, #37991: URL: https://github.com/apache/spark/pull/37991 ### What changes were proposed in this pull request? Upgrade protobuf-python from 4.21.5 to 4.21.6 ### Why are the changes needed? [CVE-2022-1941](https://nvd.nist.gov/vuln/detai

[GitHub] [spark] srowen commented on pull request #37988: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (FINAL)

2022-09-25 Thread GitBox

srowen commented on PR #37988: URL: https://github.com/apache/spark/pull/37988#issuecomment-1257243173 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark] srowen closed pull request #37988: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (FINAL)

2022-09-25 Thread GitBox

srowen closed pull request #37988: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (FINAL) URL: https://github.com/apache/spark/pull/37988 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [spark] attilapiros commented on a diff in pull request #37990: [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

2022-09-25 Thread GitBox

attilapiros commented on code in PR #37990: URL: https://github.com/apache/spark/pull/37990#discussion_r979437063 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/K8sSubmitOpSuite.scala: ## @@ -101,18 +114,19 @@ class K8sSubmitOpSuite extend

[GitHub] [spark] attilapiros commented on a diff in pull request #37990: [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

2022-09-25 Thread GitBox

attilapiros commented on code in PR #37990: URL: https://github.com/apache/spark/pull/37990#discussion_r979439689 ## resource-managers/kubernetes/core/pom.xml: ## @@ -75,6 +75,11 @@ test + Review Comment: https://github.com/fabric8io/kubernetes-client/blo

[GitHub] [spark] attilapiros commented on a diff in pull request #37990: [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

2022-09-25 Thread GitBox

attilapiros commented on code in PR #37990: URL: https://github.com/apache/spark/pull/37990#discussion_r979439689 ## resource-managers/kubernetes/core/pom.xml: ## @@ -75,6 +75,11 @@ test + Review Comment: https://github.com/fabric8io/kubernetes-client/blo

[GitHub] [spark] attilapiros commented on a diff in pull request #37990: [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

2022-09-25 Thread GitBox

attilapiros commented on code in PR #37990: URL: https://github.com/apache/spark/pull/37990#discussion_r979439689 ## resource-managers/kubernetes/core/pom.xml: ## @@ -75,6 +75,11 @@ test + Review Comment: https://github.com/fabric8io/kubernetes-client/blo

[GitHub] [spark] AmplabJenkins commented on pull request #37991: [SPARK-40552][BUILD] Upgrade `protobuf-python` to 4.21.6

2022-09-25 Thread GitBox

AmplabJenkins commented on PR #37991: URL: https://github.com/apache/spark/pull/37991#issuecomment-1257250279 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] bjornjorgensen commented on pull request #37991: [SPARK-40552][BUILD] Upgrade `protobuf-python` to 4.21.6

2022-09-25 Thread GitBox

bjornjorgensen commented on PR #37991: URL: https://github.com/apache/spark/pull/37991#issuecomment-1257271472 cc @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] itholic commented on pull request #37988: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (FINAL)

2022-09-25 Thread GitBox

itholic commented on PR #37988: URL: https://github.com/apache/spark/pull/37988#issuecomment-1257289652 Thanks for your efforts to finish this work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] srowen commented on pull request #37989: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Explicitly check the element and length

2022-09-25 Thread GitBox

srowen commented on PR #37989: URL: https://github.com/apache/spark/pull/37989#issuecomment-1257299340 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark] srowen closed pull request #37989: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Explicitly check the element and length

2022-09-25 Thread GitBox

srowen closed pull request #37989: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Explicitly check the element and length URL: https://github.com/apache/spark/pull/37989 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [spark] zhengruifeng closed pull request #37923: [SPARK-40334][PS] Implement `GroupBy.prod`

2022-09-25 Thread GitBox

zhengruifeng closed pull request #37923: [SPARK-40334][PS] Implement `GroupBy.prod` URL: https://github.com/apache/spark/pull/37923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] zhengruifeng commented on pull request #37923: [SPARK-40334][PS] Implement `GroupBy.prod`

2022-09-25 Thread GitBox

zhengruifeng commented on PR #37923: URL: https://github.com/apache/spark/pull/37923#issuecomment-1257315237 Merged into master, thank you @ayudovin for working on it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] nkronenfeld commented on pull request #36613: [WIP][SPARK-30983] Support typed select in Datasets up to the max tuple size

2022-09-25 Thread GitBox

nkronenfeld commented on PR #36613: URL: https://github.com/apache/spark/pull/36613#issuecomment-1257319350 I haven't done anything on the branch because I was waiting for comments - but as far as I know, no one even looked at it. Am I missing something for it to get considered in the firs

[GitHub] [spark] nkronenfeld commented on pull request #36613: [WIP][SPARK-30983] Support typed select in Datasets up to the max tuple size

2022-09-25 Thread GitBox

nkronenfeld commented on PR #36613: URL: https://github.com/apache/spark/pull/36613#issuecomment-1257319840 also, I don't see a button to re-open it - does anyone know where that is? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37978: [SPARK-40330][PS] Implement `Series.searchsorted`

2022-09-25 Thread GitBox

zhengruifeng commented on code in PR #37978: URL: https://github.com/apache/spark/pull/37978#discussion_r979482634 ## python/pyspark/pandas/series.py: ## @@ -6610,6 +6610,78 @@ def compare( ) return DataFrame(internal) +# todo: 1, support array-like 'valu

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37978: [SPARK-40330][PS] Implement `Series.searchsorted`

2022-09-25 Thread GitBox

zhengruifeng commented on code in PR #37978: URL: https://github.com/apache/spark/pull/37978#discussion_r979482727 ## python/pyspark/pandas/series.py: ## @@ -6610,6 +6610,78 @@ def compare( ) return DataFrame(internal) +# todo: 1, support array-like 'valu

[GitHub] [spark] github-actions[bot] commented on pull request #36829: [SPARK-39438][SQL] Add a threshold to not in line CTE

2022-09-25 Thread GitBox

github-actions[bot] commented on PR #36829: URL: https://github.com/apache/spark/pull/36829#issuecomment-1257322956 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #36301: [SPARK-21697][SQL] NPE & ExceptionInInitializerError trying to load UDF from HDFS

2022-09-25 Thread GitBox

github-actions[bot] closed pull request #36301: [SPARK-21697][SQL] NPE & ExceptionInInitializerError trying to load UDF from HDFS URL: https://github.com/apache/spark/pull/36301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] github-actions[bot] commented on pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions

2022-09-25 Thread GitBox

github-actions[bot] commented on PR #36265: URL: https://github.com/apache/spark/pull/36265#issuecomment-1257322967 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #36208: [SPARK-38911][CORE] Fix the potential resource profile id mess up issue

2022-09-25 Thread GitBox

github-actions[bot] closed pull request #36208: [SPARK-38911][CORE] Fix the potential resource profile id mess up issue URL: https://github.com/apache/spark/pull/36208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] github-actions[bot] closed pull request #36180: [SPARK-38887][SQL] Support switch inner join side for sort merge join

2022-09-25 Thread GitBox

github-actions[bot] closed pull request #36180: [SPARK-38887][SQL] Support switch inner join side for sort merge join URL: https://github.com/apache/spark/pull/36180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] closed pull request #36151: WIP: [SPARK-27998] [SQL] Add support for double-quoted named expressions

2022-09-25 Thread GitBox

github-actions[bot] closed pull request #36151: WIP: [SPARK-27998] [SQL] Add support for double-quoted named expressions URL: https://github.com/apache/spark/pull/36151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] github-actions[bot] closed pull request #36874: [SPARK-39475][SQL] Pull out complex join keys for shuffled join

2022-09-25 Thread GitBox

github-actions[bot] closed pull request #36874: [SPARK-39475][SQL] Pull out complex join keys for shuffled join URL: https://github.com/apache/spark/pull/36874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] github-actions[bot] closed pull request #36128: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

2022-09-25 Thread GitBox

github-actions[bot] closed pull request #36128: [SPARK-3][SQL] Pushdown scalar-subquery filter to FileSourceScan URL: https://github.com/apache/spark/pull/36128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] closed pull request #36088: [SPARK-38805][SHUFFLE] Automatically remove an expired indexFilePath from the ESS shuffleIndexCache or the PBS indexCache to save memor

2022-09-25 Thread GitBox

github-actions[bot] closed pull request #36088: [SPARK-38805][SHUFFLE] Automatically remove an expired indexFilePath from the ESS shuffleIndexCache or the PBS indexCache to save memory. URL: https://github.com/apache/spark/pull/36088 -- This is an automated message from the Apache Git Servic

[GitHub] [spark] github-actions[bot] commented on pull request #35858: [SPARK-38448] [YARN] [CORE] Sending Available Resources in Yarn Cluster Information to Spark Driver

2022-09-25 Thread GitBox

github-actions[bot] commented on PR #35858: URL: https://github.com/apache/spark/pull/35858#issuecomment-1257323009 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #35927: [WIP] Simplify the rule of auto-generated alias name

2022-09-25 Thread GitBox

github-actions[bot] closed pull request #35927: [WIP] Simplify the rule of auto-generated alias name URL: https://github.com/apache/spark/pull/35927 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [spark] github-actions[bot] commented on pull request #35845: [SPARK-38520][SQL] ANSI interval overflow when reading CSV

2022-09-25 Thread GitBox

github-actions[bot] commented on PR #35845: URL: https://github.com/apache/spark/pull/35845#issuecomment-1257323023 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35867: [SPARK-38559][SQL][WEBUI]Display the number of empty partitions on spark ui

2022-09-25 Thread GitBox

github-actions[bot] commented on PR #35867: URL: https://github.com/apache/spark/pull/35867#issuecomment-1257323002 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35808: [WIP][SPARK-38512] Rebased traversal order from "pre-order" to "post-order" for `ResolveFunctions` Rule

2022-09-25 Thread GitBox

github-actions[bot] commented on PR #35808: URL: https://github.com/apache/spark/pull/35808#issuecomment-1257323034 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35764: [SPARK-38444][SQL]Automatically calculate the upper and lower bounds of partitions when no specified partition related params

2022-09-25 Thread GitBox

github-actions[bot] commented on PR #35764: URL: https://github.com/apache/spark/pull/35764#issuecomment-1257323067 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-09-25 Thread GitBox

github-actions[bot] commented on PR #35799: URL: https://github.com/apache/spark/pull/35799#issuecomment-1257323057 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35763: [SPARK-38433][BUILD] change the shell code style with shellcheck

2022-09-25 Thread GitBox

github-actions[bot] commented on PR #35763: URL: https://github.com/apache/spark/pull/35763#issuecomment-1257323076 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35806: [SPARK-38505][SQL] Make partial aggregation adaptive

2022-09-25 Thread GitBox

github-actions[bot] commented on PR #35806: URL: https://github.com/apache/spark/pull/35806#issuecomment-1257323038 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] bersprockets commented on a diff in pull request #37825: [SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in `RewriteDistinctAggregates`

2022-09-25 Thread GitBox

bersprockets commented on code in PR #37825: URL: https://github.com/apache/spark/pull/37825#discussion_r979487585 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala: ## @@ -291,7 +298,8 @@ object RewriteDistinctAggregates exte

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37991: [SPARK-40552][BUILD] Upgrade `protobuf-python` to 4.21.6

2022-09-25 Thread GitBox

HyukjinKwon commented on code in PR #37991: URL: https://github.com/apache/spark/pull/37991#discussion_r979489138 ## dev/requirements.txt: ## @@ -48,4 +48,4 @@ black==22.6.0 # Spark Connect grpcio==1.48.1 -protobuf==4.21.5 \ No newline at end of file +protobuf==4.21.6 Revie

[GitHub] [spark] bersprockets commented on a diff in pull request #37825: [SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in `RewriteDistinctAggregates`

2022-09-25 Thread GitBox

bersprockets commented on code in PR #37825: URL: https://github.com/apache/spark/pull/37825#discussion_r979489753 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala: ## @@ -213,7 +213,16 @@ object RewriteDistinctAggregates ext

[GitHub] [spark] Kwafoor closed pull request #37951: [SPARK-40506]Spark Streaming metrics name doesn't need application name

2022-09-25 Thread GitBox

Kwafoor closed pull request #37951: [SPARK-40506]Spark Streaming metrics name doesn't need application name URL: https://github.com/apache/spark/pull/37951 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] mridulm commented on pull request #37989: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Explicitly check the element and length

2022-09-25 Thread GitBox

mridulm commented on PR #37989: URL: https://github.com/apache/spark/pull/37989#issuecomment-1257374954 This is very interesting behavior ! Thanks for fixing this @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] weixiuli commented on a diff in pull request #37922: [WIP][SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2022-09-25 Thread GitBox

weixiuli commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r979507631 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2543,16 +2541,13 @@ private[spark] class DAGScheduler( shuffleIdToMapStage.filter {

[GitHub] [spark] HeartSaVioR commented on pull request #37285: [POC][PYTHON][SS] Arbitrary stateful processing in Structured Streaming with Python

2022-09-25 Thread GitBox

HeartSaVioR commented on PR #37285: URL: https://github.com/apache/spark/pull/37285#issuecomment-1257400388 We can close this now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] HeartSaVioR closed pull request #37285: [POC][PYTHON][SS] Arbitrary stateful processing in Structured Streaming with Python

2022-09-25 Thread GitBox

HeartSaVioR closed pull request #37285: [POC][PYTHON][SS] Arbitrary stateful processing in Structured Streaming with Python URL: https://github.com/apache/spark/pull/37285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [spark] HeartSaVioR commented on pull request #37935: [SPARK-40492][SS] Do maintenance before streaming StateStore unload

2022-09-25 Thread GitBox

HeartSaVioR commented on PR #37935: URL: https://github.com/apache/spark/pull/37935#issuecomment-1257401965 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #37979: [SPARK-40545][SQL][TESTS] Clean up `metastorePath` after `SparkSQLEnvSuite` execution

2022-09-25 Thread GitBox

LuciferYang commented on PR #37979: URL: https://github.com/apache/spark/pull/37979#issuecomment-1257402939 thanks @wangyum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on pull request #37976: [SPARK-40544][SQL][TESTS] Restore the file appender log level threshold of the hive UTs to info

2022-09-25 Thread GitBox

LuciferYang commented on PR #37976: URL: https://github.com/apache/spark/pull/37976#issuecomment-1257403015 thanks @wangyum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HeartSaVioR closed pull request #37935: [SPARK-40492][SS] Do maintenance before streaming StateStore unload

2022-09-25 Thread GitBox

HeartSaVioR closed pull request #37935: [SPARK-40492][SS] Do maintenance before streaming StateStore unload URL: https://github.com/apache/spark/pull/37935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HeartSaVioR commented on pull request #37935: [SPARK-40492][SS] Do maintenance before streaming StateStore unload

2022-09-25 Thread GitBox

HeartSaVioR commented on PR #37935: URL: https://github.com/apache/spark/pull/37935#issuecomment-1257403721 Thanks @chaoqin-li1123 for the contribution! I merged this to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] zhengruifeng opened a new pull request, #37992: [SPARK-40554][PS] Make `ddof` in `DataFrame.sem` and `Series.sem` accept arbitary integers

2022-09-25 Thread GitBox

zhengruifeng opened a new pull request, #37992: URL: https://github.com/apache/spark/pull/37992 ### What changes were proposed in this pull request? Make `ddof` in `DataFrame.sem` and `Series.sem` accept arbitary integers ### Why are the changes needed? for API coverage

[GitHub] [spark] beliefer commented on a diff in pull request #37825: [SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in `RewriteDistinctAggregates`

2022-09-25 Thread GitBox

beliefer commented on code in PR #37825: URL: https://github.com/apache/spark/pull/37825#discussion_r979537901 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala: ## @@ -218,9 +218,16 @@ object RewriteDistinctAggregates extends

[GitHub] [spark] grundprinzip commented on pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-25 Thread GitBox

grundprinzip commented on PR #37710: URL: https://github.com/apache/spark/pull/37710#issuecomment-1257437750 Ack, I will regenerate the protos and update. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] HyukjinKwon commented on pull request #37978: [SPARK-40330][PS] Implement `Series.searchsorted`

2022-09-25 Thread GitBox

HyukjinKwon commented on PR #37978: URL: https://github.com/apache/spark/pull/37978#issuecomment-1257439757 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #37978: [SPARK-40330][PS] Implement `Series.searchsorted`

2022-09-25 Thread GitBox

HyukjinKwon closed pull request #37978: [SPARK-40330][PS] Implement `Series.searchsorted` URL: https://github.com/apache/spark/pull/37978 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on pull request #37978: [SPARK-40330][PS] Implement `Series.searchsorted`

2022-09-25 Thread GitBox

zhengruifeng commented on PR #37978: URL: https://github.com/apache/spark/pull/37978#issuecomment-1257440961 @HyukjinKwon @itholic Thanks for the reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] grundprinzip opened a new pull request, #37993: [Cleanup] Update generated proto files for Spark Connect

2022-09-25 Thread GitBox

grundprinzip opened a new pull request, #37993: URL: https://github.com/apache/spark/pull/37993 ### What changes were proposed in this pull request? This patch cleans up the generated proto files from the initial Spark Connect import. The previous files had a Databricks specific g

[GitHub] [spark] beliefer commented on pull request #37825: [SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in `RewriteDistinctAggregates`

2022-09-25 Thread GitBox

beliefer commented on PR #37825: URL: https://github.com/apache/spark/pull/37825#issuecomment-1257447790 It seems a little complex. I have an idea to simplify the binary expressions in other optimizer rule. Please reference `SimplifyBinaryComparison`. -- This is an automated mess

[GitHub] [spark] HyukjinKwon commented on pull request #37993: [CONNECT] [Cleanup] Update generated proto files for Spark Connect

2022-09-25 Thread GitBox

HyukjinKwon commented on PR #37993: URL: https://github.com/apache/spark/pull/37993#issuecomment-1257456288 (Should probably need a separate JIRA for this) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] lvshaokang commented on pull request #37986: [SPARK-40357][SQL] Migrate window type check failures onto error classes

2022-09-25 Thread GitBox

lvshaokang commented on PR #37986: URL: https://github.com/apache/spark/pull/37986#issuecomment-1257478094 @MaxGekk I have addressed, please take a look, thk! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] cloud-fan commented on pull request #36700: [SPARK-39318][SQL] Remove tpch-plan-stability WithStats golden files

2022-09-25 Thread GitBox

cloud-fan commented on PR #36700: URL: https://github.com/apache/spark/pull/36700#issuecomment-1257481093 sorry I missed this PR. @ulysses-you can you do a rebase? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] ulysses-you opened a new pull request, #36700: [SPARK-39318][SQL] Remove tpch-plan-stability WithStats golden files

2022-09-25 Thread GitBox

ulysses-you opened a new pull request, #36700: URL: https://github.com/apache/spark/pull/36700 ### What changes were proposed in this pull request? Remove all TPCH with stats golden files. ### Why are the changes needed? It's a dead golden files since we have no s

[GitHub] [spark] cloud-fan commented on a diff in pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions

2022-09-25 Thread GitBox

cloud-fan commented on code in PR #36265: URL: https://github.com/apache/spark/pull/36265#discussion_r979562710 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2594,6 +2601,31 @@ class Analyzer(override val catalogManager: CatalogMan

[GitHub] [spark] cloud-fan commented on pull request #37982: [SPARK-38717][SQL][3.3] Handle Hive's bucket spec case preserving behaviour

2022-09-25 Thread GitBox

cloud-fan commented on PR #37982: URL: https://github.com/apache/spark/pull/37982#issuecomment-1257487700 all tests passed: https://github.com/peter-toth/spark/runs/8514875267 merging to 3.3, thanks! -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [spark] cloud-fan closed pull request #37982: [SPARK-38717][SQL][3.3] Handle Hive's bucket spec case preserving behaviour

2022-09-25 Thread GitBox

cloud-fan closed pull request #37982: [SPARK-38717][SQL][3.3] Handle Hive's bucket spec case preserving behaviour URL: https://github.com/apache/spark/pull/37982 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] cloud-fan commented on a diff in pull request #35789: [SPARK-32268][SQL] Row-level Runtime Filtering

2022-09-25 Thread GitBox

cloud-fan commented on code in PR #35789: URL: https://github.com/apache/spark/pull/35789#discussion_r979569276 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala: ## @@ -0,0 +1,303 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [spark] cloud-fan commented on a diff in pull request #37679: [SPARK-35242][SQL] Support changing session catalog's default database

2022-09-25 Thread GitBox

cloud-fan commented on code in PR #37679: URL: https://github.com/apache/spark/pull/37679#discussion_r979572317 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -1932,6 +1932,13 @@ private[sql] object QueryExecutionErrors extends Quer

[GitHub] [spark] cloud-fan commented on a diff in pull request #37679: [SPARK-35242][SQL] Support changing session catalog's default database

2022-09-25 Thread GitBox

cloud-fan commented on code in PR #37679: URL: https://github.com/apache/spark/pull/37679#discussion_r979572820 ## sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala: ## @@ -148,13 +148,19 @@ private[sql] class SharedState( val externalCatalog = SharedS

[GitHub] [spark] Ngone51 commented on a diff in pull request #37268: [SPARK-39853][CORE] Support stage level task resource profile for standalone cluster when dynamic allocation disabled

2022-09-25 Thread GitBox

Ngone51 commented on code in PR #37268: URL: https://github.com/apache/spark/pull/37268#discussion_r979579541 ## core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala: ## @@ -59,35 +59,65 @@ private[spark] class ResourceProfileManager(sparkConf: SparkConf,

[GitHub] [spark] beliefer commented on a diff in pull request #35789: [SPARK-32268][SQL] Row-level Runtime Filtering

2022-09-25 Thread GitBox

beliefer commented on code in PR #35789: URL: https://github.com/apache/spark/pull/35789#discussion_r979579903 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala: ## @@ -0,0 +1,303 @@ +/* + * Licensed to the Apache Software Foundatio

[GitHub] [spark] cloud-fan commented on a diff in pull request #37825: [SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in `RewriteDistinctAggregates`

2022-09-25 Thread GitBox

cloud-fan commented on code in PR #37825: URL: https://github.com/apache/spark/pull/37825#discussion_r979582134 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala: ## @@ -218,9 +218,16 @@ object RewriteDistinctAggregates extend

[GitHub] [spark] amaliujia opened a new pull request, #37994: [SPARK-40454] Initial DSL framework for protobuf testing

2022-09-25 Thread GitBox

amaliujia opened a new pull request, #37994: URL: https://github.com/apache/spark/pull/37994 ### What changes were proposed in this pull request? Implement an approach to testing the proto to Scala conversion with a DSL according to the proposal in [spark-connect-testing-

[GitHub] [spark] amaliujia commented on pull request #37994: [SPARK-40454][Connect]Initial DSL framework for protobuf testing

2022-09-25 Thread GitBox

amaliujia commented on PR #37994: URL: https://github.com/apache/spark/pull/37994#issuecomment-1257516316 @cloud-fan @HyukjinKwon @@grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] zhengruifeng opened a new pull request, #37995: [SPARK-40556][PS][SQL] Clean the intermediate cached datasets created in `AttachDistributedSequenceExec`

2022-09-25 Thread GitBox

zhengruifeng opened a new pull request, #37995: URL: https://github.com/apache/spark/pull/37995 ### What changes were proposed in this pull request? to clean the intermediate cached datasets created in `AttachDistributedSequenceExec` 1, persist the input dataset on the python side;

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-25 Thread GitBox

HyukjinKwon commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r979588104 ## connect/src/main/scala/org/apache/spark/sql/connect/package.scala: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-25 Thread GitBox

HyukjinKwon commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r979588691 ## connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-25 Thread GitBox

HyukjinKwon commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r979588835 ## connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [spark] grundprinzip commented on pull request #37993: [SPARK-40557] [CONNECT] [Cleanup] Update generated proto files for Spark Connect

2022-09-25 Thread GitBox

grundprinzip commented on PR #37993: URL: https://github.com/apache/spark/pull/37993#issuecomment-1257533571 Created [SPARK-40557] to track. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] mskapilks opened a new pull request, #37996: [SPARK-40558][SQL] Add Reusable Exchange in Bloom creation side plan

2022-09-25 Thread GitBox

mskapilks opened a new pull request, #37996: URL: https://github.com/apache/spark/pull/37996 ### What changes were proposed in this pull request? Currently we allow only a specific pattern in bloom creation side plan (consecutive Filter/Project/Scan nodes) with added column pr

99 matches

Mail list logo