[GitHub] [spark] anchovYu commented on a diff in pull request #37887: [SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions

2022-11-04 Thread GitBox
anchovYu commented on code in PR #37887: URL: https://github.com/apache/spark/pull/37887#discussion_r1014283621 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala: ## @@ -20,66 +20,112 @@ package

[GitHub] [spark] ueshin commented on a diff in pull request #38223: [SPARK-40770][PYTHON] Improved error messages for applyInPandas for schema mismatch

2022-11-04 Thread GitBox
ueshin commented on code in PR #38223: URL: https://github.com/apache/spark/pull/38223#discussion_r1014300546 ## python/pyspark/worker.py: ## @@ -159,27 +226,13 @@ def wrapped(left_key_series, left_value_series, right_key_series, right_value_se key_series =

[GitHub] [spark] MaxGekk commented on pull request #38498: [SPARK-40769][CORE][SQL] Migrate type check failures of aggregate expressions onto error classes

2022-11-04 Thread GitBox
MaxGekk commented on PR #38498: URL: https://github.com/apache/spark/pull/38498#issuecomment-1303953617 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] asfgit closed pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13

2022-11-04 Thread GitBox
asfgit closed pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13 URL: https://github.com/apache/spark/pull/38427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] srowen commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
srowen commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014417907 ## docs/sql-performance-tuning.md: ## @@ -295,7 +294,7 @@ AQE converts sort-merge join to broadcast hash join when the runtime statistics

[GitHub] [spark] srowen commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
srowen commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014421466 ## docs/sql-performance-tuning.md: ## @@ -295,7 +294,7 @@ AQE converts sort-merge join to broadcast hash join when the runtime statistics

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1014475066 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014269647 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014279677 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014280302 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,65 @@ private[sql] object ArrowConverters extends Logging {

[GitHub] [spark] gengliangwang closed pull request #38479: [SPARK-40697][SQL][FOLLOWUP] Read-side char padding should only be applied if necessary

2022-11-04 Thread GitBox
gengliangwang closed pull request #38479: [SPARK-40697][SQL][FOLLOWUP] Read-side char padding should only be applied if necessary URL: https://github.com/apache/spark/pull/38479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014302073 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -41,23 +42,49 @@ object UnsupportedOperationChecker

[GitHub] [spark] gengliangwang commented on pull request #38479: [SPARK-40697][SQL][FOLLOWUP] Read-side char padding should only be applied if necessary

2022-11-04 Thread GitBox
gengliangwang commented on PR #38479: URL: https://github.com/apache/spark/pull/38479#issuecomment-1303969098 Thanks for fixing it. Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] gengliangwang opened a new pull request, #38513: [SPARK-40903][SQL][FOLLOWUP] Cast canonicalized Add as its original data type if necessary

2022-11-04 Thread GitBox
gengliangwang opened a new pull request, #38513: URL: https://github.com/apache/spark/pull/38513 ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/38379. On second thought, if the canonicalized `Add` has a

[GitHub] [spark] gengliangwang commented on pull request #38513: [SPARK-40903][SQL][FOLLOWUP] Cast canonicalized Add as its original data type if necessary

2022-11-04 Thread GitBox
gengliangwang commented on PR #38513: URL: https://github.com/apache/spark/pull/38513#issuecomment-1304002389 cc @cloud-fan @srielau @ulysses-you @peter-toth -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1014476080 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] swamirishi commented on pull request #38377: [SPARK-40901][CORE] Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-04 Thread GitBox
swamirishi commented on PR #38377: URL: https://github.com/apache/spark/pull/38377#issuecomment-1303865771 > Two points: > > * spark.driver.log.dfsDir is typically expected to be a path to hdfs - so resolving it relative to current working directory does not make sense > * If

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014281945 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -41,23 +42,49 @@ object UnsupportedOperationChecker

[GitHub] [spark] MaxGekk commented on a diff in pull request #37887: [SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions

2022-11-04 Thread GitBox
MaxGekk commented on code in PR #37887: URL: https://github.com/apache/spark/pull/37887#discussion_r1014297862 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala: ## @@ -20,66 +20,112 @@ package

[GitHub] [spark] jerrypeng commented on a diff in pull request #38430: [SPARK-40957] Add in memory cache in HDFSMetadataLog

2022-11-04 Thread GitBox
jerrypeng commented on code in PR #38430: URL: https://github.com/apache/spark/pull/38430#discussion_r1014297681 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala: ## @@ -277,10 +295,34 @@ class HDFSMetadataLog[T <: AnyRef :

[GitHub] [spark] MaxGekk closed pull request #38498: [SPARK-40769][CORE][SQL] Migrate type check failures of aggregate expressions onto error classes

2022-11-04 Thread GitBox
MaxGekk closed pull request #38498: [SPARK-40769][CORE][SQL] Migrate type check failures of aggregate expressions onto error classes URL: https://github.com/apache/spark/pull/38498 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dwsmith1983 commented on pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
dwsmith1983 commented on PR #38510: URL: https://github.com/apache/spark/pull/38510#issuecomment-1304103037 > OK, any other related files you want to check while your'e here? I am doing some studying so not sure what other docs I will read and when. -- This is an automated message

[GitHub] [spark] dwsmith1983 commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
dwsmith1983 commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014419552 ## docs/sql-performance-tuning.md: ## @@ -295,7 +294,7 @@ AQE converts sort-merge join to broadcast hash join when the runtime statistics

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014279677 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] alex-balikov commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox
alex-balikov commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014428293 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -157,10 +172,11 @@ object

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #37972: [SPARK-40654][SQL] Protobuf support for Spark - from_protobuf AND to_protobuf

2022-11-04 Thread GitBox
SandishKumarHN commented on code in PR #37972: URL: https://github.com/apache/spark/pull/37972#discussion_r1014260137 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014270879 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014270879 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] amaliujia commented on pull request #38506: [SPARK-41010][CONNECT][PYTHON] Complete Support for Except and Intersect in Python client

2022-11-04 Thread GitBox
amaliujia commented on PR #38506: URL: https://github.com/apache/spark/pull/38506#issuecomment-1303988661 Ok added short description for the new test cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] amaliujia commented on pull request #38488: [SPARK-41002][CONNECT][PYTHON] Compatible `take`, `head` and `first` API in Python client

2022-11-04 Thread GitBox
amaliujia commented on PR #38488: URL: https://github.com/apache/spark/pull/38488#issuecomment-1303988901 Ok added short description for the new test cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-11-04 Thread GitBox
mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1014392620 ## core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala: ## @@ -1780,7 +1802,19 @@ private[spark] object JsonProtocolSuite extends Assertions { |

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014391666 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala: ## @@ -507,15 +507,13 @@ class UnsupportedOperationsSuite extends

[GitHub] [spark] AmplabJenkins commented on pull request #38509: [SPARK-41014][PySpark][DOC] Improve documentation and typing of groupby and cogroup applyInPandas

2022-11-04 Thread GitBox
AmplabJenkins commented on PR #38509: URL: https://github.com/apache/spark/pull/38509#issuecomment-1304060587 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
AmplabJenkins commented on PR #38510: URL: https://github.com/apache/spark/pull/38510#issuecomment-1304060535 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] mridulm commented on pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13

2022-11-04 Thread GitBox
mridulm commented on PR #38427: URL: https://github.com/apache/spark/pull/38427#issuecomment-1304076366 Merged to master. Thanks for working on this @eejbyfeldt ! Thanks for the reviews @srowen, @dongjoon-hyun, @LuciferYang :-) -- This is an automated message from the Apache Git

[GitHub] [spark] MaxGekk opened a new pull request, #38514: [WIP][SQL] Provide a query context to `failAnalysis()`

2022-11-04 Thread GitBox
MaxGekk opened a new pull request, #38514: URL: https://github.com/apache/spark/pull/38514 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] aokolnychyi commented on pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-11-04 Thread GitBox
aokolnychyi commented on PR #36304: URL: https://github.com/apache/spark/pull/36304#issuecomment-1304020160 Still remember about following up on this and another PR. Slowly getting there. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] mridulm commented on a diff in pull request #38377: [SPARK-40901][CORE] Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-04 Thread GitBox
mridulm commented on code in PR #38377: URL: https://github.com/apache/spark/pull/38377#discussion_r1014469439 ## core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala: ## @@ -142,7 +142,7 @@ private[spark] class DriverLogger(conf: SparkConf) extends Logging {

[GitHub] [spark] dwsmith1983 commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
dwsmith1983 commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014424662 ## docs/sql-performance-tuning.md: ## @@ -295,7 +294,7 @@ AQE converts sort-merge join to broadcast hash join when the runtime statistics

[GitHub] [spark] alex-balikov commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox
alex-balikov commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014425174 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -157,10 +193,11 @@ object

[GitHub] [spark] AmplabJenkins commented on pull request #38505: [SPARK-40622][WIP]do not merge(try to fix build error)

2022-11-04 Thread GitBox
AmplabJenkins commented on PR #38505: URL: https://github.com/apache/spark/pull/38505#issuecomment-1304270304 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38506: [SPARK-41010][CONNECT][PYTHON] Complete Support for Except and Intersect in Python client

2022-11-04 Thread GitBox
AmplabJenkins commented on PR #38506: URL: https://github.com/apache/spark/pull/38506#issuecomment-1304270266 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014522636 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014522636 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] wankunde commented on pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-04 Thread GitBox
wankunde commented on PR #38495: URL: https://github.com/apache/spark/pull/38495#issuecomment-1304399983 Retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] liuzqt commented on pull request #38064: [SPARK-40622][SQL][CORE]Result of a single task in collect() must fit in 2GB

2022-11-04 Thread GitBox
liuzqt commented on PR #38064: URL: https://github.com/apache/spark/pull/38064#issuecomment-1304306588 @mridulm I got a error when running that command in my local ``` [error] /Users/ziqi.liu/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:51:

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014523963 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014532124 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] github-actions[bot] commented on pull request #37315: [SPARK-39892][SQL] Use ArrowType.Decimal(precision, scale, bitWidth) instead of ArrowType.Decimal(precision, scale)

2022-11-04 Thread GitBox
github-actions[bot] commented on PR #37315: URL: https://github.com/apache/spark/pull/37315#issuecomment-1304354536 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37309: [SPARK-39871][CORE] Jmx http interface supported for SparkHistoryServer

2022-11-04 Thread GitBox
github-actions[bot] commented on PR #37309: URL: https://github.com/apache/spark/pull/37309#issuecomment-1304354542 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #37239: [SPARK-39825][SQL] Fix PushDownLeftSemiAntiJoin push through project

2022-11-04 Thread GitBox
github-actions[bot] closed pull request #37239: [SPARK-39825][SQL] Fix PushDownLeftSemiAntiJoin push through project URL: https://github.com/apache/spark/pull/37239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] commented on pull request #37104: [SPARK-39698][SQL] Use `TakeOrderedAndProject` if maxRows below the `spark.sql.execution.topKSortMaxRowsThreshold`

2022-11-04 Thread GitBox
github-actions[bot] commented on PR #37104: URL: https://github.com/apache/spark/pull/37104#issuecomment-1304354561 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #37226: [MINOR][SQL] Simplify the description of built-in function.

2022-11-04 Thread GitBox
github-actions[bot] closed pull request #37226: [MINOR][SQL] Simplify the description of built-in function. URL: https://github.com/apache/spark/pull/37226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] github-actions[bot] commented on pull request #37009: [SPARK-38292][PYTHON]Support na_filter for pyspark.pandas.read_csv

2022-11-04 Thread GitBox
github-actions[bot] commented on PR #37009: URL: https://github.com/apache/spark/pull/37009#issuecomment-1304354576 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-11-04 Thread GitBox
github-actions[bot] closed pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables URL: https://github.com/apache/spark/pull/37083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] github-actions[bot] commented on pull request #34637: [SPARK-37349][SQL] add SQL Rest API parsing logic

2022-11-04 Thread GitBox
github-actions[bot] commented on PR #34637: URL: https://github.com/apache/spark/pull/34637#issuecomment-1304354583 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] attilapiros opened a new pull request, #38516: Initial version

2022-11-04 Thread GitBox
attilapiros opened a new pull request, #38516: URL: https://github.com/apache/spark/pull/38516 ### What changes were proposed in this pull request? This is an update of https://github.com/apache/spark/pull/29178 which was closed because the root cause of the error was just vaguely

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014502881 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014506227 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014515719 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014530432 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014532124 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] SandishKumarHN commented on pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

2022-11-04 Thread GitBox
SandishKumarHN commented on PR #38515: URL: https://github.com/apache/spark/pull/38515#issuecomment-1304351416 @rangadi Because some random numbers do not convert to catalyst type, a null check for the data generator is required. -- This is an automated message from the Apache Git

[GitHub] [spark] SandishKumarHN opened a new pull request, #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

2022-11-04 Thread GitBox
SandishKumarHN opened a new pull request, #38515: URL: https://github.com/apache/spark/pull/38515 ### What changes were proposed in this pull request? null check for data generator after type conversion NA ### Why are the changes needed? NA ### Does this PR

[GitHub] [spark] mridulm commented on pull request #38064: [SPARK-40622][SQL][CORE]Result of a single task in collect() must fit in 2GB

2022-11-04 Thread GitBox
mridulm commented on PR #38064: URL: https://github.com/apache/spark/pull/38064#issuecomment-1304281819 Looks like doc build is failing and so failing build ... Can you run `build/sbt -Phadoop-3 -Pyarn -Pdocker-integration-tests -Pspark-ganglia-lgpl -Phive -Pmesos -Phive-thriftserver

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014507922 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] swamirishi commented on a diff in pull request #38377: [SPARK-40901][CORE] Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-04 Thread GitBox
swamirishi commented on code in PR #38377: URL: https://github.com/apache/spark/pull/38377#discussion_r1014524002 ## core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala: ## @@ -142,7 +142,7 @@ private[spark] class DriverLogger(conf: SparkConf) extends Logging

[GitHub] [spark] ljfgem commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
ljfgem commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014536719 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] attilapiros commented on pull request #38516: [SPARK-32380][SQL] Fixing access of HBase table via Hive

2022-11-04 Thread GitBox
attilapiros commented on PR #38516: URL: https://github.com/apache/spark/pull/38516#issuecomment-1304373701 cc @dongjoon-hyun, @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wankunde commented on pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-04 Thread GitBox
wankunde commented on PR #38495: URL: https://github.com/apache/spark/pull/38495#issuecomment-1304400795 @cloud-fan @AngersZh Could you help to review this PR ? Another PR https://github.com/apache/spark/pull/38496 depends on this. -- This is an automated message from the Apache Git

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014522636 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014526120 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] srowen commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
srowen commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014525886 ## docs/sql-performance-tuning.md: ## @@ -77,8 +77,8 @@ that these options will be deprecated in future release as more optimizations ar

[GitHub] [spark] ljfgem commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
ljfgem commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014550696 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] jerrypeng opened a new pull request, #38517: [WIP][SPARK-39591][SS] Async Progress Tracking

2022-11-04 Thread GitBox
jerrypeng opened a new pull request, #38517: URL: https://github.com/apache/spark/pull/38517 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014522309 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014524565 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014524565 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014549689 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014559938 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014560227 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014559977 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013946849 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] FouadApp closed pull request #38512: WIP: [SPARK-38564] Support read hive table from subdirectory source

2022-11-04 Thread GitBox
FouadApp closed pull request #38512: WIP: [SPARK-38564] Support read hive table from subdirectory source URL: https://github.com/apache/spark/pull/38512 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect

2022-11-04 Thread GitBox
HyukjinKwon commented on code in PR #38462: URL: https://github.com/apache/spark/pull/38462#discussion_r1013933320 ## python/pyspark/sql/connect/column.py: ## @@ -99,11 +101,59 @@ def to_plan(self, session: Optional["RemoteSparkSession"]) -> "proto.Expression"

[GitHub] [spark] MaxGekk commented on a diff in pull request #37972: [SPARK-40654][SQL] Protobuf support for Spark - from_protobuf AND to_protobuf

2022-11-04 Thread GitBox
MaxGekk commented on code in PR #37972: URL: https://github.com/apache/spark/pull/37972#discussion_r1013948594 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013951701 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013973661 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013972337 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] HyukjinKwon closed pull request #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client

2022-11-04 Thread GitBox
HyukjinKwon closed pull request #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client URL: https://github.com/apache/spark/pull/38485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect

2022-11-04 Thread GitBox
HyukjinKwon commented on PR #38462: URL: https://github.com/apache/spark/pull/38462#issuecomment-1303686102 Merged to master. Let's address complete types in a followup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] FouadApp commented on pull request #32679: [SPARK-28098][SQL]Support read hive table while LeafDir had multi-level paths

2022-11-04 Thread GitBox
FouadApp commented on PR #32679: URL: https://github.com/apache/spark/pull/32679#issuecomment-1303772792 > Any chance of this getting picked up again? I saw it was merged in a fork: [lyft#40](https://github.com/lyft/spark/pull/40) but it would be great to have it upstream but, it's

[GitHub] [spark] FouadApp opened a new pull request, #38512: WIP: [SPARK-38564] Support read hive table from subdirectory source

2022-11-04 Thread GitBox
FouadApp opened a new pull request, #38512: URL: https://github.com/apache/spark/pull/38512 ### What changes were proposed in this pull request? This support could read source files of partitioned hive table with subdirectories. ### Why are the changes needed? While use

[GitHub] [spark] HyukjinKwon commented on pull request #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client

2022-11-04 Thread GitBox
HyukjinKwon commented on PR #38485: URL: https://github.com/apache/spark/pull/38485#issuecomment-1303325206 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaxGekk commented on a diff in pull request #37972: [SPARK-40654][SQL] Protobuf support for Spark - from_protobuf AND to_protobuf

2022-11-04 Thread GitBox
MaxGekk commented on code in PR #37972: URL: https://github.com/apache/spark/pull/37972#discussion_r1013951505 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013958443 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013958443 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect

2022-11-04 Thread GitBox
HyukjinKwon commented on code in PR #38462: URL: https://github.com/apache/spark/pull/38462#discussion_r1013930085 ## python/pyspark/sql/connect/column.py: ## @@ -99,11 +101,59 @@ def to_plan(self, session: Optional["RemoteSparkSession"]) -> "proto.Expression"

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013952829 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] MaxGekk commented on a diff in pull request #38447: [SPARK-40973][SQL] Rename `_LEGACY_ERROR_TEMP_0055` to `UNCLOSED_BRACKETED_COMMENT`

2022-11-04 Thread GitBox
MaxGekk commented on code in PR #38447: URL: https://github.com/apache/spark/pull/38447#discussion_r1013953353 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -608,8 +608,12 @@ private[sql] object QueryParsingErrors extends

  1   2   >