[GitHub] [spark] itholic commented on a diff in pull request #40324: [WIP][SPARK-42496][CONNECT][DOCS] Adding Spark Connect to the Spark 3.4 documentation

2023-03-09 Thread via GitHub
itholic commented on code in PR #40324: URL: https://github.com/apache/spark/pull/40324#discussion_r1130622567 ## docs/index.md: ## @@ -49,8 +49,19 @@ For Java 11, `-Dio.netty.tryReflectionSetAccessible=true` is required additional # Running the Examples and Shell -Spark c

[GitHub] [spark] LuciferYang commented on a diff in pull request #40342: [SPARK-42721][CONNECT] RPC logging interceptor

2023-03-09 Thread via GitHub
LuciferYang commented on code in PR #40342: URL: https://github.com/apache/spark/pull/40342#discussion_r1130638704 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/LoggingInterceptor.scala: ## @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Softwar

[GitHub] [spark] zhengruifeng commented on pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array

2023-03-09 Thread via GitHub
zhengruifeng commented on PR #40349: URL: https://github.com/apache/spark/pull/40349#issuecomment-1461580489 cc @WeichenXu123 @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] zhengruifeng closed pull request #40329: [SPARK-42710][CONNECT][PYTHON] Rename FrameMap proto to MapPartitions

2023-03-09 Thread via GitHub
zhengruifeng closed pull request #40329: [SPARK-42710][CONNECT][PYTHON] Rename FrameMap proto to MapPartitions URL: https://github.com/apache/spark/pull/40329 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] zhengruifeng commented on pull request #40329: [SPARK-42710][CONNECT][PYTHON] Rename FrameMap proto to MapPartitions

2023-03-09 Thread via GitHub
zhengruifeng commented on PR #40329: URL: https://github.com/apache/spark/pull/40329#issuecomment-1461587410 merged to master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] AngersZhuuuu commented on pull request #40315: [SPARK-42699][CONNECT] SparkConnectServer should make client and AM same exit code

2023-03-09 Thread via GitHub
AngersZh commented on PR #40315: URL: https://github.com/apache/spark/pull/40315#issuecomment-1461590509 ping @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] xinrong-meng commented on pull request #40329: [SPARK-42710][CONNECT][PYTHON] Rename FrameMap proto to MapPartitions

2023-03-09 Thread via GitHub
xinrong-meng commented on PR #40329: URL: https://github.com/apache/spark/pull/40329#issuecomment-1461592928 Thanks @zhengruifeng ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [spark] xinrong-meng opened a new pull request, #40350: [SPARK-42710][CONNECT][PYTHON] Implement `DataFrame.mapInArrow`

2023-03-09 Thread via GitHub
xinrong-meng opened a new pull request, #40350: URL: https://github.com/apache/spark/pull/40350 ### What changes were proposed in this pull request? Implement `DataFrame.mapInArrow`. ### Why are the changes needed? Parity with vanilla PySpark. ### Does this PR introduce _a

[GitHub] [spark] cloud-fan commented on a diff in pull request #40301: [SPARK-42685][CORE] Optimize Utils.bytesToString routines

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #40301: URL: https://github.com/apache/spark/pull/40301#discussion_r1130695992 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -1305,41 +1305,30 @@ private[spark] object Utils extends Logging { (JavaUtils.byteStringAsBytes(s

[GitHub] [spark] huangxiaopingRD opened a new pull request, #40351: [SPARK-42727][CORE] Support executing spark commands in the root directory when local mode is specified

2023-03-09 Thread via GitHub
huangxiaopingRD opened a new pull request, #40351: URL: https://github.com/apache/spark/pull/40351 ### What changes were proposed in this pull request? Special treatment for the root directory when parsing userClassPath ### Why are the changes needed? I found that executing t

[GitHub] [spark] wangyum commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-09 Thread via GitHub
wangyum commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1130705751 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -113,7 +114,7 @@ object UnwrapCastInBinaryComparison ex

[GitHub] [spark] alkis commented on pull request #40302: [SPARK-42686][CORE] Defer formatting for debug messages in TaskMemoryManager

2023-03-09 Thread via GitHub
alkis commented on PR #40302: URL: https://github.com/apache/spark/pull/40302#issuecomment-1461760841 Good idea! Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] alkis commented on a diff in pull request #40301: [SPARK-42685][CORE] Optimize Utils.bytesToString routines

2023-03-09 Thread via GitHub
alkis commented on code in PR #40301: URL: https://github.com/apache/spark/pull/40301#discussion_r1130795171 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -1305,41 +1305,30 @@ private[spark] object Utils extends Logging { (JavaUtils.byteStringAsBytes(str)

[GitHub] [spark] panbingkun commented on a diff in pull request #39949: [SPARK-42386][SQL] Rewrite HiveGenericUDF with Invoke

2023-03-09 Thread via GitHub
panbingkun commented on code in PR #39949: URL: https://github.com/apache/spark/pull/39949#discussion_r1130811640 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -194,47 +183,52 @@ private[hive] case class HiveGenericUDF( override protected def wi

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params

2023-03-09 Thread via GitHub
HyukjinKwon commented on code in PR #40349: URL: https://github.com/apache/spark/pull/40349#discussion_r1130826498 ## python/pyspark/sql/connect/expressions.py: ## @@ -277,14 +281,26 @@ def _infer_type(cls, value: Any) -> DataType: return DateType() elif is

[GitHub] [spark] HyukjinKwon commented on pull request #40260: [SPARK-42630][CONNECT][PYTHON] Introduce UnparsedDataType and delay parsing DDL string until SparkConnectClient is available

2023-03-09 Thread via GitHub
HyukjinKwon commented on PR #40260: URL: https://github.com/apache/spark/pull/40260#issuecomment-1461828508 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params

2023-03-09 Thread via GitHub
WeichenXu123 commented on code in PR #40349: URL: https://github.com/apache/spark/pull/40349#discussion_r1130844165 ## python/pyspark/sql/tests/connect/test_connect_column.py: ## @@ -437,7 +436,6 @@ def test_literal_with_unsupported_type(self): (0.1, DecimalType()),

[GitHub] [spark] HyukjinKwon closed pull request #40260: [SPARK-42630][CONNECT][PYTHON] Introduce UnparsedDataType and delay parsing DDL string until SparkConnectClient is available

2023-03-09 Thread via GitHub
HyukjinKwon closed pull request #40260: [SPARK-42630][CONNECT][PYTHON] Introduce UnparsedDataType and delay parsing DDL string until SparkConnectClient is available URL: https://github.com/apache/spark/pull/40260 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params

2023-03-09 Thread via GitHub
zhengruifeng commented on code in PR #40349: URL: https://github.com/apache/spark/pull/40349#discussion_r1130850965 ## python/pyspark/sql/tests/connect/test_connect_plan.py: ## @@ -943,6 +945,39 @@ def test_column_expressions(self): mod_fun.unresolved_function.argu

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params

2023-03-09 Thread via GitHub
zhengruifeng commented on code in PR #40349: URL: https://github.com/apache/spark/pull/40349#discussion_r1130851705 ## python/pyspark/sql/tests/connect/test_connect_column.py: ## @@ -437,7 +436,6 @@ def test_literal_with_unsupported_type(self): (0.1, DecimalType()),

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params

2023-03-09 Thread via GitHub
zhengruifeng commented on code in PR #40349: URL: https://github.com/apache/spark/pull/40349#discussion_r1130850965 ## python/pyspark/sql/tests/connect/test_connect_plan.py: ## @@ -943,6 +945,39 @@ def test_column_expressions(self): mod_fun.unresolved_function.argu

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params

2023-03-09 Thread via GitHub
zhengruifeng commented on code in PR #40349: URL: https://github.com/apache/spark/pull/40349#discussion_r1130852388 ## python/pyspark/sql/connect/expressions.py: ## @@ -277,14 +281,26 @@ def _infer_type(cls, value: Any) -> DataType: return DateType() elif i

[GitHub] [spark] wangyum commented on a diff in pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-03-09 Thread via GitHub
wangyum commented on code in PR #39691: URL: https://github.com/apache/spark/pull/39691#discussion_r1130865469 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1729,6 +1729,23 @@ class Analyzer(override val catalogManager: CatalogMana

[GitHub] [spark] LuciferYang opened a new pull request, #40352: [SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
LuciferYang opened a new pull request, #40352: URL: https://github.com/apache/spark/pull/40352 ### What changes were proposed in this pull request? This is pr using `BloomFilterAggregate` to implement `bloomFilter` function for `DataFrameStatFunctions`. Since `BloomFilterAggregate` is an

[GitHub] [spark] WeichenXu123 opened a new pull request, #40353: [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method

2023-03-09 Thread via GitHub
WeichenXu123 opened a new pull request, #40353: URL: https://github.com/apache/spark/pull/40353 ### What changes were proposed in this pull request? Support spark connect session getActiveSession method. Spark connect ML needs this API to get active session in some cases

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130900032 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -250,3 +251,48 @@ case class BroadcastQueryStageExec( override def

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130902352 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -55,17 +50,7 @@ abstract class QueryStageExec extends LeafExecNode {

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130903095 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -55,17 +50,7 @@ abstract class QueryStageExec extends LeafExecNode {

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130904593 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -250,3 +251,48 @@ case class BroadcastQueryStageExec( override def

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130913005 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala: ## @@ -388,6 +388,7 @@ case class InMemoryRelation( @transient val parti

[GitHub] [spark] WeichenXu123 commented on pull request #40353: [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method

2023-03-09 Thread via GitHub
WeichenXu123 commented on PR #40353: URL: https://github.com/apache/spark/pull/40353#issuecomment-1461919214 CC @HyukjinKwon @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130918471 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -220,10 +221,28 @@ case class AdaptiveSparkPlanExec( } p

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40326: [SPARK-42708] [Docs] Improve doc about protobuf java file can't be indexed.

2023-03-09 Thread via GitHub
Hisoka-X commented on code in PR #40326: URL: https://github.com/apache/spark/pull/40326#discussion_r1130922230 ## connector/protobuf/README.md: ## @@ -34,3 +34,17 @@ export SPARK_PROTOC_EXEC_PATH=/path-to-protoc-exe The user-defined `protoc` binary files can be produced in t

[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1130941590 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -113,7 +114,7 @@ object UnwrapCastInBinaryComparison

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130945518 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -250,3 +251,48 @@ case class BroadcastQueryStageExec( override d

[GitHub] [spark] cloud-fan commented on a diff in pull request #39949: [SPARK-42386][SQL] Rewrite HiveGenericUDF with Invoke

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39949: URL: https://github.com/apache/spark/pull/39949#discussion_r1130947376 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -194,47 +183,52 @@ private[hive] case class HiveGenericUDF( override protected def wit

[GitHub] [spark] cloud-fan commented on a diff in pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39990: URL: https://github.com/apache/spark/pull/39990#discussion_r1130954327 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -410,6 +410,16 @@ private[v2] trait V2JDBCTest extends Share

[GitHub] [spark] cloud-fan commented on a diff in pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39990: URL: https://github.com/apache/spark/pull/39990#discussion_r1130957560 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -410,6 +410,16 @@ private[v2] trait V2JDBCTest extends Share

[GitHub] [spark] cloud-fan commented on a diff in pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39990: URL: https://github.com/apache/spark/pull/39990#discussion_r1130957560 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -410,6 +410,16 @@ private[v2] trait V2JDBCTest extends Share

[GitHub] [spark] cloud-fan commented on a diff in pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39990: URL: https://github.com/apache/spark/pull/39990#discussion_r1130961053 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala: ## @@ -181,22 +181,42 @@ private case object OracleDialect extends JdbcDialect { if (li

[GitHub] [spark] wangyum commented on a diff in pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-03-09 Thread via GitHub
wangyum commented on code in PR #39691: URL: https://github.com/apache/spark/pull/39691#discussion_r1130962983 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1729,6 +1729,23 @@ class Analyzer(override val catalogManager: CatalogMana

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40353: [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method

2023-03-09 Thread via GitHub
zhengruifeng commented on code in PR #40353: URL: https://github.com/apache/spark/pull/40353#discussion_r1130963826 ## python/pyspark/sql/connect/session.py: ## @@ -119,7 +122,11 @@ def enableHiveSupport(self) -> "SparkSession.Builder": raise NotImplementedError("en

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130968837 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -220,10 +221,28 @@ case class AdaptiveSparkPlanExec( }

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40353: [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method

2023-03-09 Thread via GitHub
WeichenXu123 commented on code in PR #40353: URL: https://github.com/apache/spark/pull/40353#discussion_r1130972732 ## python/pyspark/sql/connect/session.py: ## @@ -119,7 +122,11 @@ def enableHiveSupport(self) -> "SparkSession.Builder": raise NotImplementedError("en

[GitHub] [spark] LuciferYang commented on a diff in pull request #40352: [SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
LuciferYang commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131019335 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -265,7 +268,23 @@ object SparkConnectService { }

[GitHub] [spark] LuciferYang commented on a diff in pull request #40352: [SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
LuciferYang commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131019335 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -265,7 +268,23 @@ object SparkConnectService { }

[GitHub] [spark] LuciferYang commented on a diff in pull request #40352: [SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
LuciferYang commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131051613 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -265,7 +268,23 @@ object SparkConnectService { }

[GitHub] [spark] panbingkun commented on a diff in pull request #39949: [SPARK-42386][SQL] Rewrite HiveGenericUDF with Invoke

2023-03-09 Thread via GitHub
panbingkun commented on code in PR #39949: URL: https://github.com/apache/spark/pull/39949#discussion_r1131070156 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -194,47 +183,52 @@ private[hive] case class HiveGenericUDF( override protected def wi

[GitHub] [spark] peter-toth commented on pull request #40266: [SPARK-42660][SQL] Infer filters for Join produced by IN and EXISTS clause (RewritePredicateSubquery rule)

2023-03-09 Thread via GitHub
peter-toth commented on PR #40266: URL: https://github.com/apache/spark/pull/40266#issuecomment-1462138771 This change makes sense to me and new plans look ok to me. However, seemingly `InferFiltersFromConstraints` has a dedicated place in the optimizer and so there are 2 special batches

[GitHub] [spark] ebarault commented on pull request #37153: [SPARK-26052] Add type comments to exposed Prometheus metrics

2023-03-09 Thread via GitHub
ebarault commented on PR #37153: URL: https://github.com/apache/spark/pull/37153#issuecomment-1462161162 @danielhaviv hi, it looks like your PR was left untouched by the spark team... 👎 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1131119650 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala: ## @@ -166,4 +170,32 @@ case class InMemoryTableScanExec( protect

[GitHub] [spark] yeachan153 commented on pull request #36434: [SPARK-38969][K8S] Fix Decom reporting

2023-03-09 Thread via GitHub
yeachan153 commented on PR #36434: URL: https://github.com/apache/spark/pull/36434#issuecomment-1462177964 @holdenk unfortunately the decom script still seems to be exiting with 137. Were you able to replicate that? -- This is an automated message from the Apache Git Service. To respond t

[GitHub] [spark] wangyum commented on a diff in pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-03-09 Thread via GitHub
wangyum commented on code in PR #39691: URL: https://github.com/apache/spark/pull/39691#discussion_r1131137466 ## sql/core/src/test/resources/sql-tests/inputs/window.sql: ## @@ -465,3 +465,60 @@ SELECT SUM(salary) OVER w sum_salary FROM basic_pays; + +-- Test QU

[GitHub] [spark] wangyum commented on a diff in pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-03-09 Thread via GitHub
wangyum commented on code in PR #39691: URL: https://github.com/apache/spark/pull/39691#discussion_r1131138491 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -571,7 +571,7 @@ private[sql] object QueryCompilationErrors extends Quer

[GitHub] [spark] MaxGekk closed pull request #40336: [SPARK-42706][SQL][DOCS] Document the Spark SQL error classes in user-facing documentation.

2023-03-09 Thread via GitHub
MaxGekk closed pull request #40336: [SPARK-42706][SQL][DOCS] Document the Spark SQL error classes in user-facing documentation. URL: https://github.com/apache/spark/pull/40336 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] MaxGekk commented on pull request #40336: [SPARK-42706][SQL][DOCS] Document the Spark SQL error classes in user-facing documentation.

2023-03-09 Thread via GitHub
MaxGekk commented on PR #40336: URL: https://github.com/apache/spark/pull/40336#issuecomment-1462195522 +1, LGTM. Merging to master. Thank you, @itholic and @srielau for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] MaxGekk commented on pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-09 Thread via GitHub
MaxGekk commented on PR #40236: URL: https://github.com/apache/spark/pull/40236#issuecomment-1462210586 @the8thC Do you have an account of OSS JIRA (https://issues.apache.org/jira/browse/SPARK-38735)? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] tomvanbussel opened a new pull request, #40354: [SPARK-42735][CONNECT][SCALA] Allow passing extra confs to RemoteSparkSession

2023-03-09 Thread via GitHub
tomvanbussel opened a new pull request, #40354: URL: https://github.com/apache/spark/pull/40354 ### What changes were proposed in this pull request? This PR changes `RemoteSparkSession` to add an overrideable `sparkConf` field that can be used by tests to pass additional configuration

[GitHub] [spark] beliefer opened a new pull request, #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub
beliefer opened a new pull request, #40355: URL: https://github.com/apache/spark/pull/40355 ### What changes were proposed in this pull request? Spark connect already supported `functions.lit`, but `functions.typedlit`. This PR add some new msg to the connect protocol and support `func

[GitHub] [spark] navinvishy commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-09 Thread via GitHub
navinvishy commented on PR #38947: URL: https://github.com/apache/spark/pull/38947#issuecomment-1462275861 @dongjoon-hyun @beliefer @LuciferYang @cloud-fan @HyukjinKwon gentle ping to please take a look. Thanks! -- This is an automated message from the Apache Git Service. To respond to th

[GitHub] [spark] zzzzming95 commented on a diff in pull request #40341: [SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException

2023-03-09 Thread via GitHub
ming95 commented on code in PR #40341: URL: https://github.com/apache/spark/pull/40341#discussion_r1131247723 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java: ## @@ -204,7 +204,12 @@ public void initBatch( * by copying

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub
LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1131290738 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala: ## @@ -2065,6 +2065,43 @@ class PlanGenerationTestSuite fn

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub
LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1131296289 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -195,6 +197,17 @@ message Expression { DataType elementType = 1;

[GitHub] [spark] LuciferYang commented on pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub
LuciferYang commented on PR #40355: URL: https://github.com/apache/spark/pull/40355#issuecomment-1462380542 Thanks for your work. I'll take a closer look tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub
LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1131296289 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -195,6 +197,17 @@ message Expression { DataType elementType = 1;

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub
LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1131296289 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -195,6 +197,17 @@ message Expression { DataType elementType = 1;

[GitHub] [spark] rangadi commented on a diff in pull request #40342: [SPARK-42721][CONNECT] RPC logging interceptor

2023-03-09 Thread via GitHub
rangadi commented on code in PR #40342: URL: https://github.com/apache/spark/pull/40342#discussion_r1131303296 ## connector/connect/server/pom.xml: ## @@ -155,6 +155,12 @@ ${protobuf.version} compile + + com.google.protobuf + protobuf-java-util

[GitHub] [spark] rangadi commented on a diff in pull request #40342: [SPARK-42721][CONNECT] RPC logging interceptor

2023-03-09 Thread via GitHub
rangadi commented on code in PR #40342: URL: https://github.com/apache/spark/pull/40342#discussion_r1131312944 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/LoggingInterceptor.scala: ## @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-09 Thread via GitHub
dongjoon-hyun commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1131340809 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager: Catal

[GitHub] [spark] the8thC commented on pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-09 Thread via GitHub
the8thC commented on PR #40236: URL: https://github.com/apache/spark/pull/40236#issuecomment-1462470101 @MaxGekk No, I don't, but I've just requested one using "selfserve". Is that the right way? -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [spark] mridulm closed pull request #40339: [SPARK-42719][CORE] `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`

2023-03-09 Thread via GitHub
mridulm closed pull request #40339: [SPARK-42719][CORE] `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled` URL: https://github.com/apache/spark/pull/40339 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [spark] holdenk commented on pull request #36434: [SPARK-38969][K8S] Fix Decom reporting

2023-03-09 Thread via GitHub
holdenk commented on PR #36434: URL: https://github.com/apache/spark/pull/36434#issuecomment-1462499267 Exit code 137 generally refers to out of memory at the container level, can you increase the overhead and see if it still occurs for you? -- This is an automated message from the Apache

[GitHub] [spark] mridulm commented on pull request #40339: [SPARK-42719][CORE] `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`

2023-03-09 Thread via GitHub
mridulm commented on PR #40339: URL: https://github.com/apache/spark/pull/40339#issuecomment-1462498084 Merged to master. Thanks for working on this @jerqi ! Thanks for the reviews @cloud-fan, @LuciferYang, @advancedxy :-) -- This is an automated message from the Apache Git Service.

[GitHub] [spark] holdenk commented on pull request #36434: [SPARK-38969][K8S] Fix Decom reporting

2023-03-09 Thread via GitHub
holdenk commented on PR #36434: URL: https://github.com/apache/spark/pull/36434#issuecomment-1462501095 Or do you have a repro? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] zhenlineo commented on a diff in pull request #40354: [SPARK-42735][CONNECT][SCALA] Allow passing extra confs to RemoteSparkSession

2023-03-09 Thread via GitHub
zhenlineo commented on code in PR #40354: URL: https://github.com/apache/spark/pull/40354#discussion_r1131440189 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -114,19 +118,25 @@ object SparkConnectServerUt

[GitHub] [spark] hvanhovell commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
hvanhovell commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131445842 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1073,6 +1074,12 @@ class SparkConnectPlanner(val se

[GitHub] [spark] hvanhovell commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
hvanhovell commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131455328 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1073,6 +1074,12 @@ class SparkConnectPlanner(val se

[GitHub] [spark] hvanhovell commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
hvanhovell commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131458023 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala: ## @@ -584,6 +585,101 @@ final class DataFrameStatFunctions private

[GitHub] [spark] hvanhovell commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
hvanhovell commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131459693 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala: ## @@ -584,6 +585,101 @@ final class DataFrameStatFunctions private

[GitHub] [spark] hvanhovell commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
hvanhovell commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131460103 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala: ## @@ -176,4 +176,31 @@ class DataFrameStatSuite extends RemoteSparkSes

[GitHub] [spark] rangadi commented on a diff in pull request #40342: [SPARK-42721][CONNECT] RPC logging interceptor

2023-03-09 Thread via GitHub
rangadi commented on code in PR #40342: URL: https://github.com/apache/spark/pull/40342#discussion_r1131461102 ## connector/connect/server/pom.xml: ## @@ -155,6 +155,12 @@ ${protobuf.version} compile + + com.google.protobuf + protobuf-java-util

[GitHub] [spark] hvanhovell commented on pull request #40353: [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method

2023-03-09 Thread via GitHub
hvanhovell commented on PR #40353: URL: https://github.com/apache/spark/pull/40353#issuecomment-1462597849 @WeichenXu123 in what case won't Spark Connect ML have access to the session? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] gpiotti commented on pull request #28946: [SPARK-32123][PYSPARK] Setting `spark.sql.session.timeZone` only partially respected

2023-03-09 Thread via GitHub
gpiotti commented on PR #28946: URL: https://github.com/apache/spark/pull/28946#issuecomment-1462628214 I'm also having this issue when converting to a pandas df with toPandas, and having a column with nested struct type with timestamps in it -- This is an automated message from the Apac

[GitHub] [spark] MaxGekk commented on pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-09 Thread via GitHub
MaxGekk commented on PR #40236: URL: https://github.com/apache/spark/pull/40236#issuecomment-1462661054 > @MaxGekk No, I don't, but I've just requested one using "selfserve". Is that the right way? Yep, thank you. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] MaxGekk commented on pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-09 Thread via GitHub
MaxGekk commented on PR #40236: URL: https://github.com/apache/spark/pull/40236#issuecomment-1462661888 +1, LGTM. Merging to master. Thank you, @the8thC and @itholic for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] MaxGekk closed pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-09 Thread via GitHub
MaxGekk closed pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR URL: https://github.com/apache/spark/pull/40236 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-09 Thread via GitHub
MaxGekk commented on PR #40236: URL: https://github.com/apache/spark/pull/40236#issuecomment-1462664558 @the8thC Congratulations with your first contribution to Apache Spark! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] ueshin opened a new pull request, #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

2023-03-09 Thread via GitHub
ueshin opened a new pull request, #40356: URL: https://github.com/apache/spark/pull/40356 ### What changes were proposed in this pull request? Fixes `DataFrameWriter.save` to work without path parameter. ### Why are the changes needed? `DataFrameWriter.save` should work w

[GitHub] [spark] amaliujia commented on pull request #40346: [SPARK-42667][CONNECT][FOLLOW-UP] SparkSession created by newSession should not share the channel

2023-03-09 Thread via GitHub
amaliujia commented on PR #40346: URL: https://github.com/apache/spark/pull/40346#issuecomment-1462842661 @hvanhovell can you take another look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] ueshin commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

2023-03-09 Thread via GitHub
ueshin commented on code in PR #40356: URL: https://github.com/apache/spark/pull/40356#discussion_r1131677278 ## connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -554,7 +554,8 @@ class SparkConnectProtoSuite extends

[GitHub] [spark] zhenlineo commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

2023-03-09 Thread via GitHub
zhenlineo commented on code in PR #40356: URL: https://github.com/apache/spark/pull/40356#discussion_r1131689057 ## connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -554,7 +554,8 @@ class SparkConnectProtoSuite exte

[GitHub] [spark] amaliujia commented on a diff in pull request #40354: [SPARK-42735][CONNECT][SCALA] Allow passing extra confs to RemoteSparkSession

2023-03-09 Thread via GitHub
amaliujia commented on code in PR #40354: URL: https://github.com/apache/spark/pull/40354#discussion_r1131742174 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -114,19 +118,25 @@ object SparkConnectServerUt

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-09 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1131748163 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala: ## @@ -340,13 +358,26 @@ trait ExposesMetadataColumns exte

[GitHub] [spark] huaxingao commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-09 Thread via GitHub
huaxingao commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1131767473 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager: CatalogMa

[GitHub] [spark] ryan-johnson-databricks commented on pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-09 Thread via GitHub
ryan-johnson-databricks commented on PR #40300: URL: https://github.com/apache/spark/pull/40300#issuecomment-1463005688 > It's a good idea to provide an API that allows people to unambiguously reference metadata columns, and I like the new `Dataset.metadataColumn` function. However, I think

[GitHub] [spark] amaliujia commented on a diff in pull request #40315: [SPARK-42699][CONNECT] SparkConnectServer should make client and AM same exit code

2023-03-09 Thread via GitHub
amaliujia commented on code in PR #40315: URL: https://github.com/apache/spark/pull/40315#discussion_r1131780390 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -737,12 +737,19 @@ class SparkSession private( // scalastyle:on /** - * Stop the u

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-09 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1131780602 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2714,6 +2726,17 @@ class Dataset[T] private[sql]( */ def withColumn(colName

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-09 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1131780862 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -244,6 +245,89 @@ class FileMetadataStructSui

[GitHub] [spark] github-actions[bot] commented on pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

2023-03-09 Thread via GitHub
github-actions[bot] commented on PR #38739: URL: https://github.com/apache/spark/pull/38739#issuecomment-1463026755 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] xinrong-meng opened a new pull request, #40357: [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch

2023-03-09 Thread via GitHub
xinrong-meng opened a new pull request, #40357: URL: https://github.com/apache/spark/pull/40357 ### What changes were proposed in this pull request? In the release script, add a check to ensure release tag to be pushed to release branch. ### Why are the changes needed? To en

  1   2   >