[GitHub] [spark] aminebag commented on pull request #41423: [SPARK-43523][CORE] Fix Spark UI LiveTask memory leak

2023-06-04 Thread via GitHub
aminebag commented on PR #41423: URL: https://github.com/apache/spark/pull/41423#issuecomment-1575691248 @srowen > Are you saying that these objects are not actually usable and will never be collected because events are dropped? Yes, that's exactly what I'm saying. For example,

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41440: [SPARK-43952][CORE][CONNECT] Add SparkContext APIs for query cancellation by tag

2023-06-04 Thread via GitHub
HyukjinKwon commented on code in PR #41440: URL: https://github.com/apache/spark/pull/41440#discussion_r1217271474 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -2851,6 +2907,14 @@ object SparkContext extends Logging { */ private[spark] val

[GitHub] [spark] siying commented on pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-04 Thread via GitHub
siying commented on PR #41409: URL: https://github.com/apache/spark/pull/41409#issuecomment-1575861442 @dongjoon-hyun done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-04 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216255732 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/MergeRowsExec.scala: ## @@ -0,0 +1,216 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-04 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216253601 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/MergeRows.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] LorenzoMartini commented on pull request #40018: [SPARK-42439][SQL] In v2 writes, make createJobDescription in FileWrite.toBatch not lazy

2023-06-04 Thread via GitHub
LorenzoMartini commented on PR #40018: URL: https://github.com/apache/spark/pull/40018#issuecomment-1575674518 Maybe @MaxGekk ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #41450: [SPARK-43960][PS][TESTS] DataFrameConversionTestsMixin is not tested properly

2023-06-04 Thread via GitHub
HyukjinKwon commented on PR #41450: URL: https://github.com/apache/spark/pull/41450#issuecomment-1575790826 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #41450: [SPARK-43960][PS][TESTS] DataFrameConversionTestsMixin is not tested properly

2023-06-04 Thread via GitHub
HyukjinKwon closed pull request #41450: [SPARK-43960][PS][TESTS] DataFrameConversionTestsMixin is not tested properly URL: https://github.com/apache/spark/pull/41450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-04 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1217358049 ## sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala: ## @@ -92,6 +93,67 @@ class JoinSuite extends QueryTest with SharedSparkSession with

[GitHub] [spark] itholic opened a new pull request, #41450: [SPARK-43960][PS][TESTS] DataFrameConversionTestsMixin is not tested properly

2023-06-04 Thread via GitHub
itholic opened a new pull request, #41450: URL: https://github.com/apache/spark/pull/41450 ### What changes were proposed in this pull request? This PR proposes to fix test which is not tested properly. ### Why are the changes needed? To test properly. ###

[GitHub] [spark] aminebag commented on pull request #41423: [SPARK-43523][CORE] Fix Spark UI LiveTask memory leak

2023-06-04 Thread via GitHub
aminebag commented on PR #41423: URL: https://github.com/apache/spark/pull/41423#issuecomment-1575506059 @srowen I have applied and tested the modification you have suggested (replace toList with toArray) and it had no observable impact on the issue. I can still include it in the

[GitHub] [spark] bjornjorgensen commented on pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-06-04 Thread via GitHub
bjornjorgensen commented on PR #40420: URL: https://github.com/apache/spark/pull/40420#issuecomment-1575528446 @dzhigimont we have upgraded main branch to pandas 2.0.2 now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon closed pull request #41421: [SPARK-43881][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listDatabases

2023-06-04 Thread via GitHub
HyukjinKwon closed pull request #41421: [SPARK-43881][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listDatabases URL: https://github.com/apache/spark/pull/41421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #41421: [SPARK-43881][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listDatabases

2023-06-04 Thread via GitHub
HyukjinKwon commented on PR #41421: URL: https://github.com/apache/spark/pull/41421#issuecomment-1575788667 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41415: [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts

2023-06-04 Thread via GitHub
HyukjinKwon commented on code in PR #41415: URL: https://github.com/apache/spark/pull/41415#discussion_r1217286686 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -154,6 +154,8 @@ class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41415: [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts

2023-06-04 Thread via GitHub
HyukjinKwon commented on code in PR #41415: URL: https://github.com/apache/spark/pull/41415#discussion_r1217286896 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -154,6 +154,8 @@ class

[GitHub] [spark] beliefer commented on pull request #41421: [SPARK-43881][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listDatabases

2023-06-04 Thread via GitHub
beliefer commented on PR #41421: URL: https://github.com/apache/spark/pull/41421#issuecomment-1575891405 @HyukjinKwon Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer commented on a diff in pull request #41444: [SPARK-43916][SQL][PYTHON][CONNECT] Add percentile like functions to Scala and Python API

2023-06-04 Thread via GitHub
beliefer commented on code in PR #41444: URL: https://github.com/apache/spark/pull/41444#discussion_r1217334172 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -812,6 +812,69 @@ object functions { */ def min_by(e: Column, ord:

[GitHub] [spark] wForget commented on pull request #41332: [SPARK-43801][SQL] Support unwrap date type to string type in UnwrapCastInBinaryComparison

2023-06-04 Thread via GitHub
wForget commented on PR #41332: URL: https://github.com/apache/spark/pull/41332#issuecomment-1575952608 I found the same problem in hive data source, the partition filter cannot be pushed down due to the cast expression. Can we push down date type filter in

[GitHub] [spark] dongjoon-hyun commented on pull request #41136: [SPARK-43356][K8S] Migrate deprecated createOrReplace to serverSideApply

2023-06-04 Thread via GitHub
dongjoon-hyun commented on PR #41136: URL: https://github.com/apache/spark/pull/41136#issuecomment-1575607354 We are now using `6.7.0`. Could you rebase it to the master branch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] gengliangwang commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general constant expressions as CREATE/REPLACE TABLE OPTIONS values

2023-06-04 Thread via GitHub
gengliangwang commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1217103535 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -158,30 +158,37 @@ class ResolveSessionCatalog(val

[GitHub] [spark] HyukjinKwon commented on pull request #41440: [SPARK-43952][CORE][CONNECT] Add SparkContext APIs for query cancellation by tag

2023-06-04 Thread via GitHub
HyukjinKwon commented on PR #41440: URL: https://github.com/apache/spark/pull/41440#issuecomment-1575805298 > If we don't want to add public APIs like that, I'm also fine with having all that as private[spark]; my planned use of it is inside Spark in Spark Connect. I am fine with

[GitHub] [spark] gengliangwang commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general constant expressions as CREATE/REPLACE TABLE OPTIONS values

2023-06-04 Thread via GitHub
gengliangwang commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1217279661 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableSpec.scala: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] HyukjinKwon opened a new pull request, #41452: [DO-NOT-MERGE] Testing revert 1

2023-06-04 Thread via GitHub
HyukjinKwon opened a new pull request, #41452: URL: https://github.com/apache/spark/pull/41452 ### What changes were proposed in this pull request? TBD ### Why are the changes needed? TBD ### Does this PR introduce _any_ user-facing change? TBD ### How was

[GitHub] [spark] HyukjinKwon opened a new pull request, #41453: [DO-NOT-MERGE] Testing revert 2

2023-06-04 Thread via GitHub
HyukjinKwon opened a new pull request, #41453: URL: https://github.com/apache/spark/pull/41453 ### What changes were proposed in this pull request? TBD ### Why are the changes needed? TBD ### Does this PR introduce _any_ user-facing change? TBD ### How was

[GitHub] [spark] gengliangwang commented on a diff in pull request #41385: [SPARK-43205][SQL][FOLLOWUP] add ExpressionWithUnresolvedIdentifier to simplify code

2023-06-04 Thread via GitHub
gengliangwang commented on code in PR #41385: URL: https://github.com/apache/spark/pull/41385#discussion_r1217279071 ## sql/core/src/test/resources/sql-tests/analyzer-results/ansi/double-quoted-identifiers-disabled.sql.out: ## @@ -218,8 +218,8 @@

[GitHub] [spark] wangyum commented on a diff in pull request #41370: [SPARK-43866] Partition filter condition should pushed down to metastore query if it is equivalence Predicate

2023-06-04 Thread via GitHub
wangyum commented on code in PR #41370: URL: https://github.com/apache/spark/pull/41370#discussion_r1217318011 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala: ## @@ -1041,6 +1053,9 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-04 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216251279 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala: ## @@ -0,0 +1,347 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-04 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216251279 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala: ## @@ -0,0 +1,347 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] aminebag commented on pull request #41423: [SPARK-43523][CORE] Fix Spark UI LiveTask memory leak

2023-06-04 Thread via GitHub
aminebag commented on PR #41423: URL: https://github.com/apache/spark/pull/41423#issuecomment-1575726835 I don't think the answer is more resources. If we had more memory or CPU power we would just delay the issue we wouldn't prevent it. Also, if we had more CPU power the listener would

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41444: [WIP][SPARK-43916][SQL][PYTHON][CONNECT] Add percentile like functions to Scala and Python API

2023-06-04 Thread via GitHub
HyukjinKwon commented on code in PR #41444: URL: https://github.com/apache/spark/pull/41444#discussion_r1217278535 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -812,6 +812,69 @@ object functions { */ def min_by(e: Column,

[GitHub] [spark] gengliangwang commented on a diff in pull request #41385: [SPARK-43205][SQL][FOLLOWUP] add ExpressionWithUnresolvedIdentifier to simplify code

2023-06-04 Thread via GitHub
gengliangwang commented on code in PR #41385: URL: https://github.com/apache/spark/pull/41385#discussion_r1217278997 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala: ## @@ -18,39 +18,51 @@ package

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41415: [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts

2023-06-04 Thread via GitHub
HyukjinKwon commented on code in PR #41415: URL: https://github.com/apache/spark/pull/41415#discussion_r1217286686 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -154,6 +154,8 @@ class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41444: [SPARK-43916][SQL][PYTHON][CONNECT] Add percentile like functions to Scala and Python API

2023-06-04 Thread via GitHub
HyukjinKwon commented on code in PR #41444: URL: https://github.com/apache/spark/pull/41444#discussion_r1217343932 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -812,6 +812,69 @@ object functions { */ def min_by(e: Column,

[GitHub] [spark] ivoson commented on a diff in pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

2023-06-04 Thread via GitHub
ivoson commented on code in PR #40610: URL: https://github.com/apache/spark/pull/40610#discussion_r1216559940 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -46,7 +46,14 @@ private[sql] class SparkResult[T](

[GitHub] [spark] beliefer commented on a diff in pull request #41424: [SPARK-43913][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432]

2023-06-04 Thread via GitHub
beliefer commented on code in PR #41424: URL: https://github.com/apache/spark/pull/41424#discussion_r1216524958 ## core/src/main/resources/error/error-classes.json: ## @@ -1834,6 +1849,11 @@ ], "sqlState" : "42K05" }, + "RESOLVED_ATTRIBUTE_MISSING_FROM_INPUT" : {

[GitHub] [spark] panbingkun commented on pull request #41451: [SPARK-43948][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[0050|0057|0058|0059]

2023-06-04 Thread via GitHub
panbingkun commented on PR #41451: URL: https://github.com/apache/spark/pull/41451#issuecomment-1575582851 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] beliefer commented on a diff in pull request #41444: [SPARK-43916][SQL][PYTHON][CONNECT] Add percentile like functions to Scala and Python API

2023-06-04 Thread via GitHub
beliefer commented on code in PR #41444: URL: https://github.com/apache/spark/pull/41444#discussion_r1217345052 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -812,6 +812,69 @@ object functions { */ def min_by(e: Column, ord:

[GitHub] [spark] beliefer commented on a diff in pull request #41424: [SPARK-43913][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432]

2023-06-04 Thread via GitHub
beliefer commented on code in PR #41424: URL: https://github.com/apache/spark/pull/41424#discussion_r1216524958 ## core/src/main/resources/error/error-classes.json: ## @@ -1834,6 +1849,11 @@ ], "sqlState" : "42K05" }, + "RESOLVED_ATTRIBUTE_MISSING_FROM_INPUT" : {

[GitHub] [spark] gengliangwang commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general constant expressions as CREATE/REPLACE TABLE OPTIONS values

2023-06-04 Thread via GitHub
gengliangwang commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1217099067 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -158,30 +158,37 @@ class ResolveSessionCatalog(val

[GitHub] [spark] beliefer commented on pull request #41444: [WIP][SPARK-43916][SQL][PYTHON][CONNECT] Add percentile like functions to Scala and Python API

2023-06-04 Thread via GitHub
beliefer commented on PR #41444: URL: https://github.com/apache/spark/pull/41444#issuecomment-1575920159 ping @cloud-fan @HyukjinKwon @zhengruifeng The GA failure is unrelated to this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] panbingkun opened a new pull request, #41451: [SPARK-43948][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[0050|0057|0058|0059]

2023-06-04 Thread via GitHub
panbingkun opened a new pull request, #41451: URL: https://github.com/apache/spark/pull/41451 ### What changes were proposed in this pull request? The pr aims to assign names to the error class `_LEGACY_ERROR_TEMP_[0050|0057|0058|0059]`, details as follows: - _LEGACY_ERROR_TEMP_0050

[GitHub] [spark] srowen commented on pull request #41423: [SPARK-43523][CORE] Fix Spark UI LiveTask memory leak

2023-06-04 Thread via GitHub
srowen commented on PR #41423: URL: https://github.com/apache/spark/pull/41423#issuecomment-1575682910 I'm still not understanding why you analyze this as a leak. Are you saying that these objects are not actually usable and will never be collected because events are dropped? that would be

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-04 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216250498 ## core/src/main/resources/error/error-classes.json: ## @@ -1513,6 +1513,13 @@ "Parse Mode: . To process malformed records as null result, try setting the

[GitHub] [spark] ulysses-you opened a new pull request, #41454: [SPARK-43376][SQL][FOLLOWUP] lazy construct subquery to improve reuse subquery

2023-06-04 Thread via GitHub
ulysses-you opened a new pull request, #41454: URL: https://github.com/apache/spark/pull/41454 ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/41046 make `ReuseAdaptiveSubquery` become not idempotent. This pr reverts the change in

[GitHub] [spark] beliefer commented on pull request #41446: [SPARK-43956][SQL][3.3] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-04 Thread via GitHub
beliefer commented on PR #41446: URL: https://github.com/apache/spark/pull/41446#issuecomment-1575508970 @dongjoon-hyun @MaxGekk Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on pull request #41445: [SPARK-43956][SQL][3.4] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-04 Thread via GitHub
beliefer commented on PR #41445: URL: https://github.com/apache/spark/pull/41445#issuecomment-1575508866 @MaxGekk Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark-connect-go] HyukjinKwon closed pull request #8: [SPARK-43958] Adding support for Channel Builder

2023-06-04 Thread via GitHub
HyukjinKwon closed pull request #8: [SPARK-43958] Adding support for Channel Builder URL: https://github.com/apache/spark-connect-go/pull/8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark-connect-go] HyukjinKwon commented on pull request #8: [SPARK-43958] Adding support for Channel Builder

2023-06-04 Thread via GitHub
HyukjinKwon commented on PR #8: URL: https://github.com/apache/spark-connect-go/pull/8#issuecomment-1575537525 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zzzzming95 commented on a diff in pull request #41370: [SPARK-43866] Partition filter condition should pushed down to metastore query if it is equivalence Predicate

2023-06-04 Thread via GitHub
ming95 commented on code in PR #41370: URL: https://github.com/apache/spark/pull/41370#discussion_r1216785699 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala: ## @@ -1041,6 +1053,9 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {

[GitHub] [spark] srowen commented on pull request #41442: [SPARK-43955][BUILD] Upgrade `scalafmt` from 3.7.3 to 3.7.4

2023-06-04 Thread via GitHub
srowen commented on PR #41442: URL: https://github.com/apache/spark/pull/41442#issuecomment-1575681053 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen closed pull request #41442: [SPARK-43955][BUILD] Upgrade `scalafmt` from 3.7.3 to 3.7.4

2023-06-04 Thread via GitHub
srowen closed pull request #41442: [SPARK-43955][BUILD] Upgrade `scalafmt` from 3.7.3 to 3.7.4 URL: https://github.com/apache/spark/pull/41442 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] srowen commented on pull request #41423: [SPARK-43523][CORE] Fix Spark UI LiveTask memory leak

2023-06-04 Thread via GitHub
srowen commented on PR #41423: URL: https://github.com/apache/spark/pull/41423#issuecomment-1575695964 I get it, but this trades one incorrectness for another. I don't know of another good way here. Is this resolvable with simply more resources? more cores, mem? like is part of the problem

[GitHub] [spark] beliefer commented on a diff in pull request #41444: [WIP][SPARK-43916][SQL][PYTHON][CONNECT] Add percentile like functions to Scala and Python API

2023-06-04 Thread via GitHub
beliefer commented on code in PR #41444: URL: https://github.com/apache/spark/pull/41444#discussion_r1217334172 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -812,6 +812,69 @@ object functions { */ def min_by(e: Column, ord:

[GitHub] [spark] yaooqinn commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

2023-06-04 Thread via GitHub
yaooqinn commented on PR #41181: URL: https://github.com/apache/spark/pull/41181#issuecomment-1575994314 thanks, @dongjoon-hyun and @turboFei. Late +1 from my side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] degant commented on a diff in pull request #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-06-04 Thread via GitHub
degant commented on code in PR #41428: URL: https://github.com/apache/spark/pull/41428#discussion_r1217444147 ## docs/core-migration-guide.md: ## @@ -25,6 +25,7 @@ license: | ## Upgrading from Core 3.2 to 3.3 - Since Spark 3.3, Spark migrates its log4j dependency from 1.x

[GitHub] [spark] beliefer commented on pull request #41444: [SPARK-43916][SQL][PYTHON][CONNECT] Add percentile like functions to Scala and Python API

2023-06-04 Thread via GitHub
beliefer commented on PR #41444: URL: https://github.com/apache/spark/pull/41444#issuecomment-1576020831 @zhengruifeng The two functions used with SQL syntax like `percentile_cont(0.5) WITHIN GROUP (ORDER BY v)`. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] itholic commented on pull request #41455: [SPARK-43962][SQL] Improve error messages: `CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, `CANNOT_READ_F

2023-06-04 Thread via GitHub
itholic commented on PR #41455: URL: https://github.com/apache/spark/pull/41455#issuecomment-1576034146 cc @MaxGekk @srielau @cloud-fan Please review this when you find some time  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-04 Thread via GitHub
cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1217472430 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -397,7 +397,8 @@ object PreprocessTableInsertion extends

[GitHub] [spark] vinodkc commented on a diff in pull request #41144: [SPARK-43470][CORE] Add OS, Java, Python version information to application log

2023-06-04 Thread via GitHub
vinodkc commented on code in PR #41144: URL: https://github.com/apache/spark/pull/41144#discussion_r1217472619 ## core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala: ## @@ -106,6 +106,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( protected val

[GitHub] [spark] itholic opened a new pull request, #41455: [SPARK-43962][SQL] Improve error messages: `CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, `CANNOT_READ_

2023-06-04 Thread via GitHub
itholic opened a new pull request, #41455: URL: https://github.com/apache/spark/pull/41455 ### What changes were proposed in this pull request? This PR proposes to improve error messages for `CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`,

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-04 Thread via GitHub
cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1217471748 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala: ## @@ -274,15 +274,16 @@ class FindDataSourceTable(sparkSession:

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-04 Thread via GitHub
cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1217472203 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -397,7 +397,8 @@ object PreprocessTableInsertion extends

[GitHub] [spark] pan3793 commented on pull request #41136: [SPARK-43356][K8S] Migrate deprecated createOrReplace to serverSideApply

2023-06-04 Thread via GitHub
pan3793 commented on PR #41136: URL: https://github.com/apache/spark/pull/41136#issuecomment-1576003455 rebased on the latest master branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-04 Thread via GitHub
cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1217470812 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala: ## @@ -151,8 +151,8 @@ object DataSourceAnalysis extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-04 Thread via GitHub
cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1217471514 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala: ## @@ -274,15 +274,16 @@ class FindDataSourceTable(sparkSession:

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-04 Thread via GitHub
cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1217471285 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala: ## @@ -274,15 +274,16 @@ class FindDataSourceTable(sparkSession:

[GitHub] [spark] cloud-fan commented on a diff in pull request #41370: [SPARK-43866] Partition filter condition should pushed down to metastore query if it is equivalence Predicate

2023-06-04 Thread via GitHub
cloud-fan commented on code in PR #41370: URL: https://github.com/apache/spark/pull/41370#discussion_r1217481906 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala: ## @@ -994,6 +994,18 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { }

[GitHub] [spark] ulysses-you commented on a diff in pull request #41407: [SPARK-43900][SQL] Support optimize skewed partitions even if introduce extra shuffle

2023-06-04 Thread via GitHub
ulysses-you commented on code in PR #41407: URL: https://github.com/apache/spark/pull/41407#discussion_r1217406788 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -104,7 +104,10 @@ case class AdaptiveSparkPlanExec(

[GitHub] [spark] pan3793 commented on a diff in pull request #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-06-04 Thread via GitHub
pan3793 commented on code in PR #41428: URL: https://github.com/apache/spark/pull/41428#discussion_r1217437344 ## docs/core-migration-guide.md: ## @@ -25,6 +25,7 @@ license: | ## Upgrading from Core 3.2 to 3.3 - Since Spark 3.3, Spark migrates its log4j dependency from 1.x

[GitHub] [spark] aokolnychyi commented on pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-04 Thread via GitHub
aokolnychyi commented on PR #41448: URL: https://github.com/apache/spark/pull/41448#issuecomment-1576080545 The test failures don't seem related. I'll need to take a closer look at what happened in `sql - other tests`, though. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-04 Thread via GitHub
Hisoka-X commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1217529014 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -397,7 +397,8 @@ object PreprocessTableInsertion extends

[GitHub] [spark] allisonwang-db commented on a diff in pull request #41316: [SPARK-43798][SQL][PYTHON] Support Python user-defined table functions

2023-06-04 Thread via GitHub
allisonwang-db commented on code in PR #41316: URL: https://github.com/apache/spark/pull/41316#discussion_r1217498437 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/pythonLogicalOperators.scala: ## @@ -171,6 +186,18 @@ case class ArrowEvalPython(

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41449: [SPARK-43959][SQL] Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract

2023-06-04 Thread via GitHub
aokolnychyi commented on code in PR #41449: URL: https://github.com/apache/spark/pull/41449#discussion_r1217400261 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlignAssignmentsSuite.scala: ## @@ -36,7 +36,7 @@ import

[GitHub] [spark] vinodkc commented on pull request #41144: [SPARK-43470][CORE] Add OS, Java, Python version information to application log

2023-06-04 Thread via GitHub
vinodkc commented on PR #41144: URL: https://github.com/apache/spark/pull/41144#issuecomment-1576034499 > For the Python, the information is printed at every task execution. Could you find a proper place to print that info once, @vinodkc ? > > ``` > 23/06/03 18:29:46 INFO

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-04 Thread via GitHub
cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1217473051 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -506,7 +507,8 @@ object PreWriteCheck extends (LogicalPlan => Unit) {

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-04 Thread via GitHub
cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1217472863 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -425,7 +426,7 @@ object PreprocessTableInsertion extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-04 Thread via GitHub
cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1217473411 ## sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala: ## @@ -46,21 +47,24 @@ trait SQLInsertTestSuite extends QueryTest with SQLTestUtils { }

[GitHub] [spark] allisonwang-db commented on a diff in pull request #41316: [SPARK-43798][SQL][PYTHON] Support Python user-defined table functions

2023-06-04 Thread via GitHub
allisonwang-db commented on code in PR #41316: URL: https://github.com/apache/spark/pull/41316#discussion_r1217498437 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/pythonLogicalOperators.scala: ## @@ -171,6 +186,18 @@ case class ArrowEvalPython(

[GitHub] [spark] zhengruifeng commented on pull request #41444: [SPARK-43916][SQL][PYTHON][CONNECT] Add percentile like functions to Scala and Python API

2023-06-04 Thread via GitHub
zhengruifeng commented on PR #41444: URL: https://github.com/apache/spark/pull/41444#issuecomment-1575995156 Where are `percentile_cont` and `percentile_disc` from? I can not find them in https://spark.apache.org/docs/latest/api/sql/index.html and `FunctionRegistry` -- This is an

[GitHub] [spark] amaliujia commented on pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-04 Thread via GitHub
amaliujia commented on PR #41425: URL: https://github.com/apache/spark/pull/41425#issuecomment-1576068930 @cloud-fan done. wait CI to pass again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-04 Thread via GitHub
cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1217474065 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala: ## @@ -145,7 +145,7 @@ class DetermineTableStats(session: SparkSession) extends

[GitHub] [spark] allisonwang-db commented on a diff in pull request #41316: [SPARK-43798][SQL][PYTHON] Support Python user-defined table functions

2023-06-04 Thread via GitHub
allisonwang-db commented on code in PR #41316: URL: https://github.com/apache/spark/pull/41316#discussion_r1217487382 ## python/pyspark/sql/functions.py: ## @@ -10403,6 +10405,82 @@ def udf( return _create_py_udf(f=f, returnType=returnType, useArrow=useArrow) +def

[GitHub] [spark] cloud-fan commented on pull request #41332: [SPARK-43801][SQL] Support unwrap date type to string type in UnwrapCastInBinaryComparison

2023-06-04 Thread via GitHub
cloud-fan commented on PR #41332: URL: https://github.com/apache/spark/pull/41332#issuecomment-1575962729 My suggestion for this problem is not to abuse the string type. If the column holds timestamp values, it should be timestamp type. If you know that your string-type "timestamp" type

[GitHub] [spark] cloud-fan commented on a diff in pull request #41449: [SPARK-43959][SQL] Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract

2023-06-04 Thread via GitHub
cloud-fan commented on code in PR #41449: URL: https://github.com/apache/spark/pull/41449#discussion_r1217394475 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlignAssignmentsSuite.scala: ## @@ -36,7 +36,7 @@ import

[GitHub] [spark] cloud-fan commented on pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-04 Thread via GitHub
cloud-fan commented on PR #41425: URL: https://github.com/apache/spark/pull/41425#issuecomment-1575992736 @amaliujia can you fix conflicts? I think this PR is ready to go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-04 Thread via GitHub
cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1217470181 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala: ## @@ -165,19 +165,25 @@ case class QualifiedColType( *

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-04 Thread via GitHub
cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1217470419 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala: ## @@ -151,8 +151,8 @@ object DataSourceAnalysis extends

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-04 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216253601 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/MergeRows.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-04 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216255732 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/MergeRowsExec.scala: ## @@ -0,0 +1,216 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun closed pull request #41438: [SPARK-43953][CONNECT] Remove `pass`

2023-06-04 Thread via GitHub
dongjoon-hyun closed pull request #41438: [SPARK-43953][CONNECT] Remove `pass` URL: https://github.com/apache/spark/pull/41438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] dongjoon-hyun closed pull request #41437: [SPARK-43917][PS][INFRA] Upgrade `pandas` to 2.0.2

2023-06-04 Thread via GitHub
dongjoon-hyun closed pull request #41437: [SPARK-43917][PS][INFRA] Upgrade `pandas` to 2.0.2 URL: https://github.com/apache/spark/pull/41437 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-04 Thread via GitHub
dongjoon-hyun commented on PR #41409: URL: https://github.com/apache/spark/pull/41409#issuecomment-1575421708 Could you resolve the conflict, @siying ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] wangyum closed pull request #41419: [SPARK-43911] [SQL] Use toSet to deduplicate the iterator data to prevent the creation of large Array

2023-06-04 Thread via GitHub
wangyum closed pull request #41419: [SPARK-43911] [SQL] Use toSet to deduplicate the iterator data to prevent the creation of large Array URL: https://github.com/apache/spark/pull/41419 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] wangyum commented on pull request #41419: [SPARK-43911] [SQL] Use toSet to deduplicate the iterator data to prevent the creation of large Array

2023-06-04 Thread via GitHub
wangyum commented on PR #41419: URL: https://github.com/apache/spark/pull/41419#issuecomment-1575436568 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the