date:20230309

[GitHub] [spark] itholic commented on a diff in pull request #40324: [WIP][SPARK-42496][CONNECT][DOCS] Adding Spark Connect to the Spark 3.4 documentation

2023-03-09 Thread via GitHub

itholic commented on code in PR #40324: URL: https://github.com/apache/spark/pull/40324#discussion_r1130622567 ## docs/index.md: ## @@ -49,8 +49,19 @@ For Java 11, `-Dio.netty.tryReflectionSetAccessible=true` is required additional # Running the Examples and Shell -Spark c

[GitHub] [spark] LuciferYang commented on a diff in pull request #40342: [SPARK-42721][CONNECT] RPC logging interceptor

2023-03-09 Thread via GitHub

LuciferYang commented on code in PR #40342: URL: https://github.com/apache/spark/pull/40342#discussion_r1130638704 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/LoggingInterceptor.scala: ## @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Softwar

[GitHub] [spark] zhengruifeng commented on pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array

2023-03-09 Thread via GitHub

zhengruifeng commented on PR #40349: URL: https://github.com/apache/spark/pull/40349#issuecomment-1461580489 cc @WeichenXu123 @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] zhengruifeng closed pull request #40329: [SPARK-42710][CONNECT][PYTHON] Rename FrameMap proto to MapPartitions

2023-03-09 Thread via GitHub

zhengruifeng closed pull request #40329: [SPARK-42710][CONNECT][PYTHON] Rename FrameMap proto to MapPartitions URL: https://github.com/apache/spark/pull/40329 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] zhengruifeng commented on pull request #40329: [SPARK-42710][CONNECT][PYTHON] Rename FrameMap proto to MapPartitions

2023-03-09 Thread via GitHub

zhengruifeng commented on PR #40329: URL: https://github.com/apache/spark/pull/40329#issuecomment-1461587410 merged to master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] AngersZhuuuu commented on pull request #40315: [SPARK-42699][CONNECT] SparkConnectServer should make client and AM same exit code

2023-03-09 Thread via GitHub

AngersZh commented on PR #40315: URL: https://github.com/apache/spark/pull/40315#issuecomment-1461590509 ping @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] xinrong-meng commented on pull request #40329: [SPARK-42710][CONNECT][PYTHON] Rename FrameMap proto to MapPartitions

2023-03-09 Thread via GitHub

xinrong-meng commented on PR #40329: URL: https://github.com/apache/spark/pull/40329#issuecomment-1461592928 Thanks @zhengruifeng ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [spark] xinrong-meng opened a new pull request, #40350: [SPARK-42710][CONNECT][PYTHON] Implement `DataFrame.mapInArrow`

2023-03-09 Thread via GitHub

xinrong-meng opened a new pull request, #40350: URL: https://github.com/apache/spark/pull/40350 ### What changes were proposed in this pull request? Implement `DataFrame.mapInArrow`. ### Why are the changes needed? Parity with vanilla PySpark. ### Does this PR introduce _a

[GitHub] [spark] cloud-fan commented on a diff in pull request #40301: [SPARK-42685][CORE] Optimize Utils.bytesToString routines

2023-03-09 Thread via GitHub

cloud-fan commented on code in PR #40301: URL: https://github.com/apache/spark/pull/40301#discussion_r1130695992 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -1305,41 +1305,30 @@ private[spark] object Utils extends Logging { (JavaUtils.byteStringAsBytes(s

[GitHub] [spark] huangxiaopingRD opened a new pull request, #40351: [SPARK-42727][CORE] Support executing spark commands in the root directory when local mode is specified

2023-03-09 Thread via GitHub

huangxiaopingRD opened a new pull request, #40351: URL: https://github.com/apache/spark/pull/40351 ### What changes were proposed in this pull request? Special treatment for the root directory when parsing userClassPath ### Why are the changes needed? I found that executing t

[GitHub] [spark] wangyum commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-09 Thread via GitHub

wangyum commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1130705751 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -113,7 +114,7 @@ object UnwrapCastInBinaryComparison ex

[GitHub] [spark] alkis commented on pull request #40302: [SPARK-42686][CORE] Defer formatting for debug messages in TaskMemoryManager

2023-03-09 Thread via GitHub

alkis commented on PR #40302: URL: https://github.com/apache/spark/pull/40302#issuecomment-1461760841 Good idea! Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] alkis commented on a diff in pull request #40301: [SPARK-42685][CORE] Optimize Utils.bytesToString routines

2023-03-09 Thread via GitHub

alkis commented on code in PR #40301: URL: https://github.com/apache/spark/pull/40301#discussion_r1130795171 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -1305,41 +1305,30 @@ private[spark] object Utils extends Logging { (JavaUtils.byteStringAsBytes(str)

[GitHub] [spark] panbingkun commented on a diff in pull request #39949: [SPARK-42386][SQL] Rewrite HiveGenericUDF with Invoke

2023-03-09 Thread via GitHub

panbingkun commented on code in PR #39949: URL: https://github.com/apache/spark/pull/39949#discussion_r1130811640 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -194,47 +183,52 @@ private[hive] case class HiveGenericUDF( override protected def wi

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params

2023-03-09 Thread via GitHub

HyukjinKwon commented on code in PR #40349: URL: https://github.com/apache/spark/pull/40349#discussion_r1130826498 ## python/pyspark/sql/connect/expressions.py: ## @@ -277,14 +281,26 @@ def _infer_type(cls, value: Any) -> DataType: return DateType() elif is

[GitHub] [spark] HyukjinKwon commented on pull request #40260: [SPARK-42630][CONNECT][PYTHON] Introduce UnparsedDataType and delay parsing DDL string until SparkConnectClient is available

2023-03-09 Thread via GitHub

HyukjinKwon commented on PR #40260: URL: https://github.com/apache/spark/pull/40260#issuecomment-1461828508 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params

2023-03-09 Thread via GitHub

WeichenXu123 commented on code in PR #40349: URL: https://github.com/apache/spark/pull/40349#discussion_r1130844165 ## python/pyspark/sql/tests/connect/test_connect_column.py: ## @@ -437,7 +436,6 @@ def test_literal_with_unsupported_type(self): (0.1, DecimalType()),

[GitHub] [spark] HyukjinKwon closed pull request #40260: [SPARK-42630][CONNECT][PYTHON] Introduce UnparsedDataType and delay parsing DDL string until SparkConnectClient is available

2023-03-09 Thread via GitHub

HyukjinKwon closed pull request #40260: [SPARK-42630][CONNECT][PYTHON] Introduce UnparsedDataType and delay parsing DDL string until SparkConnectClient is available URL: https://github.com/apache/spark/pull/40260 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params

2023-03-09 Thread via GitHub

zhengruifeng commented on code in PR #40349: URL: https://github.com/apache/spark/pull/40349#discussion_r1130850965 ## python/pyspark/sql/tests/connect/test_connect_plan.py: ## @@ -943,6 +945,39 @@ def test_column_expressions(self): mod_fun.unresolved_function.argu

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params

2023-03-09 Thread via GitHub

zhengruifeng commented on code in PR #40349: URL: https://github.com/apache/spark/pull/40349#discussion_r1130851705 ## python/pyspark/sql/tests/connect/test_connect_column.py: ## @@ -437,7 +436,6 @@ def test_literal_with_unsupported_type(self): (0.1, DecimalType()),

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params

2023-03-09 Thread via GitHub

zhengruifeng commented on code in PR #40349: URL: https://github.com/apache/spark/pull/40349#discussion_r1130850965 ## python/pyspark/sql/tests/connect/test_connect_plan.py: ## @@ -943,6 +945,39 @@ def test_column_expressions(self): mod_fun.unresolved_function.argu

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params

2023-03-09 Thread via GitHub

zhengruifeng commented on code in PR #40349: URL: https://github.com/apache/spark/pull/40349#discussion_r1130852388 ## python/pyspark/sql/connect/expressions.py: ## @@ -277,14 +281,26 @@ def _infer_type(cls, value: Any) -> DataType: return DateType() elif i

[GitHub] [spark] wangyum commented on a diff in pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-03-09 Thread via GitHub

wangyum commented on code in PR #39691: URL: https://github.com/apache/spark/pull/39691#discussion_r1130865469 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1729,6 +1729,23 @@ class Analyzer(override val catalogManager: CatalogMana

[GitHub] [spark] LuciferYang opened a new pull request, #40352: [SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub

LuciferYang opened a new pull request, #40352: URL: https://github.com/apache/spark/pull/40352 ### What changes were proposed in this pull request? This is pr using `BloomFilterAggregate` to implement `bloomFilter` function for `DataFrameStatFunctions`. Since `BloomFilterAggregate` is an

[GitHub] [spark] WeichenXu123 opened a new pull request, #40353: [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method

2023-03-09 Thread via GitHub

WeichenXu123 opened a new pull request, #40353: URL: https://github.com/apache/spark/pull/40353 ### What changes were proposed in this pull request? Support spark connect session getActiveSession method. Spark connect ML needs this API to get active session in some cases

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub

cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130900032 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -250,3 +251,48 @@ case class BroadcastQueryStageExec( override def

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub

cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130902352 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -55,17 +50,7 @@ abstract class QueryStageExec extends LeafExecNode {

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub

cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130903095 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -55,17 +50,7 @@ abstract class QueryStageExec extends LeafExecNode {

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub

cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130904593 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -250,3 +251,48 @@ case class BroadcastQueryStageExec( override def

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub

cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130913005 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala: ## @@ -388,6 +388,7 @@ case class InMemoryRelation( @transient val parti

[GitHub] [spark] WeichenXu123 commented on pull request #40353: [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method

2023-03-09 Thread via GitHub

WeichenXu123 commented on PR #40353: URL: https://github.com/apache/spark/pull/40353#issuecomment-1461919214 CC @HyukjinKwon @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub

cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130918471 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -220,10 +221,28 @@ case class AdaptiveSparkPlanExec( } p

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40326: [SPARK-42708] [Docs] Improve doc about protobuf java file can't be indexed.

2023-03-09 Thread via GitHub

Hisoka-X commented on code in PR #40326: URL: https://github.com/apache/spark/pull/40326#discussion_r1130922230 ## connector/protobuf/README.md: ## @@ -34,3 +34,17 @@ export SPARK_PROTOC_EXEC_PATH=/path-to-protoc-exe The user-defined `protoc` binary files can be produced in t

[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-09 Thread via GitHub

cloud-fan commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1130941590 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -113,7 +114,7 @@ object UnwrapCastInBinaryComparison

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub

ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130945518 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -250,3 +251,48 @@ case class BroadcastQueryStageExec( override d

[GitHub] [spark] cloud-fan commented on a diff in pull request #39949: [SPARK-42386][SQL] Rewrite HiveGenericUDF with Invoke

2023-03-09 Thread via GitHub

cloud-fan commented on code in PR #39949: URL: https://github.com/apache/spark/pull/39949#discussion_r1130947376 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -194,47 +183,52 @@ private[hive] case class HiveGenericUDF( override protected def wit

[GitHub] [spark] cloud-fan commented on a diff in pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

2023-03-09 Thread via GitHub

cloud-fan commented on code in PR #39990: URL: https://github.com/apache/spark/pull/39990#discussion_r1130954327 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -410,6 +410,16 @@ private[v2] trait V2JDBCTest extends Share

[GitHub] [spark] cloud-fan commented on a diff in pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

2023-03-09 Thread via GitHub

cloud-fan commented on code in PR #39990: URL: https://github.com/apache/spark/pull/39990#discussion_r1130957560 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -410,6 +410,16 @@ private[v2] trait V2JDBCTest extends Share

[GitHub] [spark] cloud-fan commented on a diff in pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

2023-03-09 Thread via GitHub

cloud-fan commented on code in PR #39990: URL: https://github.com/apache/spark/pull/39990#discussion_r1130957560 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -410,6 +410,16 @@ private[v2] trait V2JDBCTest extends Share

[GitHub] [spark] cloud-fan commented on a diff in pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

2023-03-09 Thread via GitHub

cloud-fan commented on code in PR #39990: URL: https://github.com/apache/spark/pull/39990#discussion_r1130961053 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala: ## @@ -181,22 +181,42 @@ private case object OracleDialect extends JdbcDialect { if (li

[GitHub] [spark] wangyum commented on a diff in pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-03-09 Thread via GitHub

wangyum commented on code in PR #39691: URL: https://github.com/apache/spark/pull/39691#discussion_r1130962983 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1729,6 +1729,23 @@ class Analyzer(override val catalogManager: CatalogMana

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40353: [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method

2023-03-09 Thread via GitHub

zhengruifeng commented on code in PR #40353: URL: https://github.com/apache/spark/pull/40353#discussion_r1130963826 ## python/pyspark/sql/connect/session.py: ## @@ -119,7 +122,11 @@ def enableHiveSupport(self) -> "SparkSession.Builder": raise NotImplementedError("en

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub

ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130968837 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -220,10 +221,28 @@ case class AdaptiveSparkPlanExec( }

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40353: [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method

2023-03-09 Thread via GitHub

WeichenXu123 commented on code in PR #40353: URL: https://github.com/apache/spark/pull/40353#discussion_r1130972732 ## python/pyspark/sql/connect/session.py: ## @@ -119,7 +122,11 @@ def enableHiveSupport(self) -> "SparkSession.Builder": raise NotImplementedError("en

[GitHub] [spark] LuciferYang commented on a diff in pull request #40352: [SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub

LuciferYang commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131019335 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -265,7 +268,23 @@ object SparkConnectService { }

[GitHub] [spark] LuciferYang commented on a diff in pull request #40352: [SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub

LuciferYang commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131019335 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -265,7 +268,23 @@ object SparkConnectService { }

[GitHub] [spark] LuciferYang commented on a diff in pull request #40352: [SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub

LuciferYang commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131051613 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -265,7 +268,23 @@ object SparkConnectService { }

[GitHub] [spark] panbingkun commented on a diff in pull request #39949: [SPARK-42386][SQL] Rewrite HiveGenericUDF with Invoke

2023-03-09 Thread via GitHub

panbingkun commented on code in PR #39949: URL: https://github.com/apache/spark/pull/39949#discussion_r1131070156 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -194,47 +183,52 @@ private[hive] case class HiveGenericUDF( override protected def wi

[GitHub] [spark] peter-toth commented on pull request #40266: [SPARK-42660][SQL] Infer filters for Join produced by IN and EXISTS clause (RewritePredicateSubquery rule)

2023-03-09 Thread via GitHub

peter-toth commented on PR #40266: URL: https://github.com/apache/spark/pull/40266#issuecomment-1462138771 This change makes sense to me and new plans look ok to me. However, seemingly `InferFiltersFromConstraints` has a dedicated place in the optimizer and so there are 2 special batches

[GitHub] [spark] ebarault commented on pull request #37153: [SPARK-26052] Add type comments to exposed Prometheus metrics

2023-03-09 Thread via GitHub

ebarault commented on PR #37153: URL: https://github.com/apache/spark/pull/37153#issuecomment-1462161162 @danielhaviv hi, it looks like your PR was left untouched by the spark team... 👎 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub

ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1131119650 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala: ## @@ -166,4 +170,32 @@ case class InMemoryTableScanExec( protect

[GitHub] [spark] yeachan153 commented on pull request #36434: [SPARK-38969][K8S] Fix Decom reporting

2023-03-09 Thread via GitHub

yeachan153 commented on PR #36434: URL: https://github.com/apache/spark/pull/36434#issuecomment-1462177964 @holdenk unfortunately the decom script still seems to be exiting with 137. Were you able to replicate that? -- This is an automated message from the Apache Git Service. To respond t

[GitHub] [spark] wangyum commented on a diff in pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-03-09 Thread via GitHub

wangyum commented on code in PR #39691: URL: https://github.com/apache/spark/pull/39691#discussion_r1131137466 ## sql/core/src/test/resources/sql-tests/inputs/window.sql: ## @@ -465,3 +465,60 @@ SELECT SUM(salary) OVER w sum_salary FROM basic_pays; + +-- Test QU

[GitHub] [spark] wangyum commented on a diff in pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-03-09 Thread via GitHub

wangyum commented on code in PR #39691: URL: https://github.com/apache/spark/pull/39691#discussion_r1131138491 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -571,7 +571,7 @@ private[sql] object QueryCompilationErrors extends Quer

[GitHub] [spark] MaxGekk closed pull request #40336: [SPARK-42706][SQL][DOCS] Document the Spark SQL error classes in user-facing documentation.

2023-03-09 Thread via GitHub

MaxGekk closed pull request #40336: [SPARK-42706][SQL][DOCS] Document the Spark SQL error classes in user-facing documentation. URL: https://github.com/apache/spark/pull/40336 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] MaxGekk commented on pull request #40336: [SPARK-42706][SQL][DOCS] Document the Spark SQL error classes in user-facing documentation.

2023-03-09 Thread via GitHub

MaxGekk commented on PR #40336: URL: https://github.com/apache/spark/pull/40336#issuecomment-1462195522 +1, LGTM. Merging to master. Thank you, @itholic and @srielau for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] MaxGekk commented on pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-09 Thread via GitHub

MaxGekk commented on PR #40236: URL: https://github.com/apache/spark/pull/40236#issuecomment-1462210586 @the8thC Do you have an account of OSS JIRA (https://issues.apache.org/jira/browse/SPARK-38735)? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] tomvanbussel opened a new pull request, #40354: [SPARK-42735][CONNECT][SCALA] Allow passing extra confs to RemoteSparkSession

2023-03-09 Thread via GitHub

tomvanbussel opened a new pull request, #40354: URL: https://github.com/apache/spark/pull/40354 ### What changes were proposed in this pull request? This PR changes `RemoteSparkSession` to add an overrideable `sparkConf` field that can be used by tests to pass additional configuration

[GitHub] [spark] beliefer opened a new pull request, #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub

beliefer opened a new pull request, #40355: URL: https://github.com/apache/spark/pull/40355 ### What changes were proposed in this pull request? Spark connect already supported `functions.lit`, but `functions.typedlit`. This PR add some new msg to the connect protocol and support `func

[GitHub] [spark] navinvishy commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-09 Thread via GitHub

navinvishy commented on PR #38947: URL: https://github.com/apache/spark/pull/38947#issuecomment-1462275861 @dongjoon-hyun @beliefer @LuciferYang @cloud-fan @HyukjinKwon gentle ping to please take a look. Thanks! -- This is an automated message from the Apache Git Service. To respond to th

[GitHub] [spark] zzzzming95 commented on a diff in pull request #40341: [SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException

2023-03-09 Thread via GitHub

ming95 commented on code in PR #40341: URL: https://github.com/apache/spark/pull/40341#discussion_r1131247723 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java: ## @@ -204,7 +204,12 @@ public void initBatch( * by copying

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub

LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1131290738 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala: ## @@ -2065,6 +2065,43 @@ class PlanGenerationTestSuite fn

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub

LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1131296289 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -195,6 +197,17 @@ message Expression { DataType elementType = 1;

[GitHub] [spark] LuciferYang commented on pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub

LuciferYang commented on PR #40355: URL: https://github.com/apache/spark/pull/40355#issuecomment-1462380542 Thanks for your work. I'll take a closer look tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub

LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1131296289 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -195,6 +197,17 @@ message Expression { DataType elementType = 1;

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub

LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1131296289 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -195,6 +197,17 @@ message Expression { DataType elementType = 1;

[GitHub] [spark] rangadi commented on a diff in pull request #40342: [SPARK-42721][CONNECT] RPC logging interceptor

2023-03-09 Thread via GitHub

rangadi commented on code in PR #40342: URL: https://github.com/apache/spark/pull/40342#discussion_r1131303296 ## connector/connect/server/pom.xml: ## @@ -155,6 +155,12 @@ ${protobuf.version} compile + + com.google.protobuf + protobuf-java-util

[GitHub] [spark] rangadi commented on a diff in pull request #40342: [SPARK-42721][CONNECT] RPC logging interceptor

2023-03-09 Thread via GitHub

rangadi commented on code in PR #40342: URL: https://github.com/apache/spark/pull/40342#discussion_r1131312944 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/LoggingInterceptor.scala: ## @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-09 Thread via GitHub

dongjoon-hyun commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1131340809 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager: Catal

[GitHub] [spark] the8thC commented on pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-09 Thread via GitHub

the8thC commented on PR #40236: URL: https://github.com/apache/spark/pull/40236#issuecomment-1462470101 @MaxGekk No, I don't, but I've just requested one using "selfserve". Is that the right way? -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [spark] mridulm closed pull request #40339: [SPARK-42719][CORE] `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`

2023-03-09 Thread via GitHub

mridulm closed pull request #40339: [SPARK-42719][CORE] `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled` URL: https://github.com/apache/spark/pull/40339 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [spark] holdenk commented on pull request #36434: [SPARK-38969][K8S] Fix Decom reporting

2023-03-09 Thread via GitHub

holdenk commented on PR #36434: URL: https://github.com/apache/spark/pull/36434#issuecomment-1462499267 Exit code 137 generally refers to out of memory at the container level, can you increase the overhead and see if it still occurs for you? -- This is an automated message from the Apache

[GitHub] [spark] mridulm commented on pull request #40339: [SPARK-42719][CORE] `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`

2023-03-09 Thread via GitHub

mridulm commented on PR #40339: URL: https://github.com/apache/spark/pull/40339#issuecomment-1462498084 Merged to master. Thanks for working on this @jerqi ! Thanks for the reviews @cloud-fan, @LuciferYang, @advancedxy :-) -- This is an automated message from the Apache Git Service.

[GitHub] [spark] holdenk commented on pull request #36434: [SPARK-38969][K8S] Fix Decom reporting

2023-03-09 Thread via GitHub

holdenk commented on PR #36434: URL: https://github.com/apache/spark/pull/36434#issuecomment-1462501095 Or do you have a repro? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] zhenlineo commented on a diff in pull request #40354: [SPARK-42735][CONNECT][SCALA] Allow passing extra confs to RemoteSparkSession

2023-03-09 Thread via GitHub

zhenlineo commented on code in PR #40354: URL: https://github.com/apache/spark/pull/40354#discussion_r1131440189 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -114,19 +118,25 @@ object SparkConnectServerUt

[GitHub] [spark] hvanhovell commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub

hvanhovell commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131445842 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1073,6 +1074,12 @@ class SparkConnectPlanner(val se

[GitHub] [spark] hvanhovell commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub

hvanhovell commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131455328 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1073,6 +1074,12 @@ class SparkConnectPlanner(val se

[GitHub] [spark] hvanhovell commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub

hvanhovell commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131458023 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala: ## @@ -584,6 +585,101 @@ final class DataFrameStatFunctions private

[GitHub] [spark] hvanhovell commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub

hvanhovell commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131459693 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala: ## @@ -584,6 +585,101 @@ final class DataFrameStatFunctions private

[GitHub] [spark] hvanhovell commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub

hvanhovell commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131460103 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala: ## @@ -176,4 +176,31 @@ class DataFrameStatSuite extends RemoteSparkSes

[GitHub] [spark] rangadi commented on a diff in pull request #40342: [SPARK-42721][CONNECT] RPC logging interceptor

2023-03-09 Thread via GitHub

rangadi commented on code in PR #40342: URL: https://github.com/apache/spark/pull/40342#discussion_r1131461102 ## connector/connect/server/pom.xml: ## @@ -155,6 +155,12 @@ ${protobuf.version} compile + + com.google.protobuf + protobuf-java-util

[GitHub] [spark] hvanhovell commented on pull request #40353: [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method

2023-03-09 Thread via GitHub

hvanhovell commented on PR #40353: URL: https://github.com/apache/spark/pull/40353#issuecomment-1462597849 @WeichenXu123 in what case won't Spark Connect ML have access to the session? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] gpiotti commented on pull request #28946: [SPARK-32123][PYSPARK] Setting `spark.sql.session.timeZone` only partially respected

2023-03-09 Thread via GitHub

gpiotti commented on PR #28946: URL: https://github.com/apache/spark/pull/28946#issuecomment-1462628214 I'm also having this issue when converting to a pandas df with toPandas, and having a column with nested struct type with timestamps in it -- This is an automated message from the Apac

[GitHub] [spark] MaxGekk commented on pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-09 Thread via GitHub

MaxGekk commented on PR #40236: URL: https://github.com/apache/spark/pull/40236#issuecomment-1462661054 > @MaxGekk No, I don't, but I've just requested one using "selfserve". Is that the right way? Yep, thank you. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] MaxGekk commented on pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-09 Thread via GitHub

MaxGekk commented on PR #40236: URL: https://github.com/apache/spark/pull/40236#issuecomment-1462661888 +1, LGTM. Merging to master. Thank you, @the8thC and @itholic for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] MaxGekk closed pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-09 Thread via GitHub

MaxGekk closed pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR URL: https://github.com/apache/spark/pull/40236 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-09 Thread via GitHub

MaxGekk commented on PR #40236: URL: https://github.com/apache/spark/pull/40236#issuecomment-1462664558 @the8thC Congratulations with your first contribution to Apache Spark! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] ueshin opened a new pull request, #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

2023-03-09 Thread via GitHub

ueshin opened a new pull request, #40356: URL: https://github.com/apache/spark/pull/40356 ### What changes were proposed in this pull request? Fixes `DataFrameWriter.save` to work without path parameter. ### Why are the changes needed? `DataFrameWriter.save` should work w

[GitHub] [spark] amaliujia commented on pull request #40346: [SPARK-42667][CONNECT][FOLLOW-UP] SparkSession created by newSession should not share the channel

2023-03-09 Thread via GitHub

amaliujia commented on PR #40346: URL: https://github.com/apache/spark/pull/40346#issuecomment-1462842661 @hvanhovell can you take another look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] ueshin commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

2023-03-09 Thread via GitHub

ueshin commented on code in PR #40356: URL: https://github.com/apache/spark/pull/40356#discussion_r1131677278 ## connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -554,7 +554,8 @@ class SparkConnectProtoSuite extends

[GitHub] [spark] zhenlineo commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

2023-03-09 Thread via GitHub

zhenlineo commented on code in PR #40356: URL: https://github.com/apache/spark/pull/40356#discussion_r1131689057 ## connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -554,7 +554,8 @@ class SparkConnectProtoSuite exte

[GitHub] [spark] amaliujia commented on a diff in pull request #40354: [SPARK-42735][CONNECT][SCALA] Allow passing extra confs to RemoteSparkSession

2023-03-09 Thread via GitHub

amaliujia commented on code in PR #40354: URL: https://github.com/apache/spark/pull/40354#discussion_r1131742174 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -114,19 +118,25 @@ object SparkConnectServerUt

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-09 Thread via GitHub

ryan-johnson-databricks commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1131748163 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala: ## @@ -340,13 +358,26 @@ trait ExposesMetadataColumns exte

[GitHub] [spark] huaxingao commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-09 Thread via GitHub

huaxingao commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1131767473 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager: CatalogMa

[GitHub] [spark] ryan-johnson-databricks commented on pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-09 Thread via GitHub

ryan-johnson-databricks commented on PR #40300: URL: https://github.com/apache/spark/pull/40300#issuecomment-1463005688 > It's a good idea to provide an API that allows people to unambiguously reference metadata columns, and I like the new `Dataset.metadataColumn` function. However, I think

[GitHub] [spark] amaliujia commented on a diff in pull request #40315: [SPARK-42699][CONNECT] SparkConnectServer should make client and AM same exit code

2023-03-09 Thread via GitHub

amaliujia commented on code in PR #40315: URL: https://github.com/apache/spark/pull/40315#discussion_r1131780390 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -737,12 +737,19 @@ class SparkSession private( // scalastyle:on /** - * Stop the u

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-09 Thread via GitHub

ryan-johnson-databricks commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1131780602 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2714,6 +2726,17 @@ class Dataset[T] private[sql]( */ def withColumn(colName

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-09 Thread via GitHub

ryan-johnson-databricks commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1131780862 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -244,6 +245,89 @@ class FileMetadataStructSui

[GitHub] [spark] github-actions[bot] commented on pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

2023-03-09 Thread via GitHub

github-actions[bot] commented on PR #38739: URL: https://github.com/apache/spark/pull/38739#issuecomment-1463026755 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] xinrong-meng opened a new pull request, #40357: [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch

2023-03-09 Thread via GitHub

xinrong-meng opened a new pull request, #40357: URL: https://github.com/apache/spark/pull/40357 ### What changes were proposed in this pull request? In the release script, add a check to ensure release tag to be pushed to release branch. ### Why are the changes needed? To en

1 2 >

1 - 100 of 195 matches

Mail list logo