[GitHub] [spark] cloud-fan commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-09 Thread GitBox
cloud-fan commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1018755179 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala: ## @@ -31,20 +34,35 @@ object EliminateResolvedHint extends

[GitHub] [spark] zhengruifeng commented on pull request #38594: [SPARK-40852][CONNECT][PYTHON][FOLLOWUP] Make `Summary` a separate proto plan

2022-11-09 Thread GitBox
zhengruifeng commented on PR #38594: URL: https://github.com/apache/spark/pull/38594#issuecomment-1309898700 thank you @cloud-fan for reivews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan closed pull request #38594: [SPARK-40852][CONNECT][PYTHON][FOLLOWUP] Make `Summary` a separate proto plan

2022-11-09 Thread GitBox
cloud-fan closed pull request #38594: [SPARK-40852][CONNECT][PYTHON][FOLLOWUP] Make `Summary` a separate proto plan URL: https://github.com/apache/spark/pull/38594 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on pull request #38594: [SPARK-40852][CONNECT][PYTHON][FOLLOWUP] Make `Summary` a separate proto plan

2022-11-09 Thread GitBox
cloud-fan commented on PR #38594: URL: https://github.com/apache/spark/pull/38594#issuecomment-1309893300 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #38583: [SPARK-41092][SQL] Do not use identifier to match interval units

2022-11-09 Thread GitBox
cloud-fan commented on code in PR #38583: URL: https://github.com/apache/spark/pull/38583#discussion_r1018745916 ## docs/sql-ref-ansi-compliance.md: ## @@ -407,6 +407,7 @@ Below is a list of all the keywords in Spark SQL. |DATEADD|non-reserved|non-reserved|non-reserved|

[GitHub] [spark] zhengruifeng commented on pull request #38579: [SPARK-40877][DOC][FOLLOW-UP] Update the doc of `DataFrame.stat.crosstab `

2022-11-09 Thread GitBox
zhengruifeng commented on PR #38579: URL: https://github.com/apache/spark/pull/38579#issuecomment-1309892018 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #38579: [SPARK-40877][DOC][FOLLOW-UP] Update the doc of `DataFrame.stat.crosstab `

2022-11-09 Thread GitBox
zhengruifeng closed pull request #38579: [SPARK-40877][DOC][FOLLOW-UP] Update the doc of `DataFrame.stat.crosstab ` URL: https://github.com/apache/spark/pull/38579 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on pull request #38583: [SPARK-41092][SQL] Do not use identifier to match interval units

2022-11-09 Thread GitBox
cloud-fan commented on PR #38583: URL: https://github.com/apache/spark/pull/38583#issuecomment-130989 > But seems it doesn't add test for select b + interval '1 month' from values (1, 1)? This is an existing test and it fails in https://github.com/apache/spark/pull/38404 . After

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38579: [SPARK-40877][DOC][FOLLOW-UP] Update the doc of `DataFrame.stat.crosstab `

2022-11-09 Thread GitBox
zhengruifeng commented on code in PR #38579: URL: https://github.com/apache/spark/pull/38579#discussion_r1018743612 ## python/pyspark/sql/dataframe.py: ## @@ -4217,15 +4217,18 @@ def cov(self, col1: str, col2: str) -> float: def crosstab(self, col1: str, col2: str) ->

[GitHub] [spark] zhengruifeng commented on pull request #38594: [SPARK-40852][CONNECT][PYTHON][FOLLOWUP] Make `Summary` a separate proto plan

2022-11-09 Thread GitBox
zhengruifeng commented on PR #38594: URL: https://github.com/apache/spark/pull/38594#issuecomment-1309888458 cc @cloud-fan @HyukjinKwon @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018741978 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,97 @@ private[sql] object ArrowConverters extends Logging

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018741578 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] viirya commented on a diff in pull request #38558: [SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache

2022-11-09 Thread GitBox
viirya commented on code in PR #38558: URL: https://github.com/apache/spark/pull/38558#discussion_r1018741023 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala: ## @@ -111,10 +112,15 @@ case class InMemoryTableScanExec( override

[GitHub] [spark] cloud-fan commented on pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-09 Thread GitBox
cloud-fan commented on PR #38586: URL: https://github.com/apache/spark/pull/38586#issuecomment-1309885760 In the Scala DataFrame, `Column` is the entry API for users to build an expression tree. For example, `df.select($"a".cast("int"))`, `$"a"` is a syntax sugar to create a `Column` with

[GitHub] [spark] amaliujia commented on a diff in pull request #38546: [SPARK-41036][CONNECT][PYTHON] `columns` API should use `schema` API to avoid data fetching

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38546: URL: https://github.com/apache/spark/pull/38546#discussion_r1018712010 ## python/pyspark/sql/connect/dataframe.py: ## @@ -139,11 +139,9 @@ def columns(self) -> List[str]: if self._plan is None: return []

[GitHub] [spark] amaliujia commented on a diff in pull request #38595: [SPARK-41090][SQL] Enhance Dataset.createTempView testing coverage for `db_name.view_name`

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38595: URL: https://github.com/apache/spark/pull/38595#discussion_r1018736363 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -1135,21 +1135,27 @@ class DatasetSuite extends QueryTest } test("createTempView")

[GitHub] [spark] cloud-fan commented on a diff in pull request #38595: [SPARK-41090][SQL] Enhance Dataset.createTempView testing coverage for `db_name.view_name`

2022-11-09 Thread GitBox
cloud-fan commented on code in PR #38595: URL: https://github.com/apache/spark/pull/38595#discussion_r1018735442 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -1135,21 +1135,27 @@ class DatasetSuite extends QueryTest } test("createTempView")

[GitHub] [spark] cloud-fan closed pull request #38544: [SPARK-40815][SQL][FOLLOW-UP] Fix record reader in DelegateSymlinkTextInputFormat to avoid Hive ExecMapper.getDone() check

2022-11-09 Thread GitBox
cloud-fan closed pull request #38544: [SPARK-40815][SQL][FOLLOW-UP] Fix record reader in DelegateSymlinkTextInputFormat to avoid Hive ExecMapper.getDone() check URL: https://github.com/apache/spark/pull/38544 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cloud-fan commented on pull request #38544: [SPARK-40815][SQL][FOLLOW-UP] Fix record reader in DelegateSymlinkTextInputFormat to avoid Hive ExecMapper.getDone() check

2022-11-09 Thread GitBox
cloud-fan commented on PR #38544: URL: https://github.com/apache/spark/pull/38544#issuecomment-1309878130 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #38558: [SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache

2022-11-09 Thread GitBox
cloud-fan commented on code in PR #38558: URL: https://github.com/apache/spark/pull/38558#discussion_r1018732458 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala: ## @@ -111,10 +112,15 @@ case class InMemoryTableScanExec(

[GitHub] [spark] viirya commented on a diff in pull request #38558: [SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache

2022-11-09 Thread GitBox
viirya commented on code in PR #38558: URL: https://github.com/apache/spark/pull/38558#discussion_r1018726898 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala: ## @@ -111,10 +112,15 @@ case class InMemoryTableScanExec( override

[GitHub] [spark] MaxGekk commented on a diff in pull request #38588: [SPARK-41086][SQL] Consolidate SecondArgumentXXX error to INVALID_PARAMETER_VALUE

2022-11-09 Thread GitBox
MaxGekk commented on code in PR #38588: URL: https://github.com/apache/spark/pull/38588#discussion_r1018726135 ## core/src/main/resources/error/error-classes.json: ## @@ -2109,11 +2103,6 @@ "Unsupported component type in arrays." ] }, -

[GitHub] [spark] viirya commented on a diff in pull request #38558: [SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache

2022-11-09 Thread GitBox
viirya commented on code in PR #38558: URL: https://github.com/apache/spark/pull/38558#discussion_r1018721730 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala: ## @@ -111,10 +112,15 @@ case class InMemoryTableScanExec( override

[GitHub] [spark] MaxGekk commented on a diff in pull request #38572: [SPARK-41059][SQL] Rename `_LEGACY_ERROR_TEMP_2420` to `NESTED_AGGREGATE_FUNCTION`

2022-11-09 Thread GitBox
MaxGekk commented on code in PR #38572: URL: https://github.com/apache/spark/pull/38572#discussion_r1018721421 ## core/src/main/resources/error/error-classes.json: ## @@ -690,6 +690,16 @@ "Not allowed to implement multiple UDF interfaces, UDF class " ] }, +

[GitHub] [spark] MaxGekk commented on a diff in pull request #38575: [WIP][SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

2022-11-09 Thread GitBox
MaxGekk commented on code in PR #38575: URL: https://github.com/apache/spark/pull/38575#discussion_r1018719912 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -2330,7 +2330,7 @@ class DataFrameSuite extends QueryTest new File(uuid,

[GitHub] [spark] amaliujia commented on a diff in pull request #38588: [SPARK-41086][SQL] Consolidate SecondArgumentXXX error to INVALID_PARAMETER_VALUE

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38588: URL: https://github.com/apache/spark/pull/38588#discussion_r1018717992 ## core/src/main/resources/error/error-classes.json: ## @@ -2109,11 +2103,6 @@ "Unsupported component type in arrays." ] }, -

[GitHub] [spark] MaxGekk commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-09 Thread GitBox
MaxGekk commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1018717817 ## core/src/main/resources/error/error-classes.json: ## @@ -1277,6 +1277,11 @@ "A correlated outer name reference within a subquery expression body was not

[GitHub] [spark] MaxGekk commented on a diff in pull request #38588: [SPARK-41086][SQL] Consolidate SecondArgumentXXX error to INVALID_PARAMETER_VALUE

2022-11-09 Thread GitBox
MaxGekk commented on code in PR #38588: URL: https://github.com/apache/spark/pull/38588#discussion_r1018715875 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -2082,10 +2084,12 @@ private[sql] object QueryCompilationErrors extends

[GitHub] [spark] amaliujia commented on a diff in pull request #38506: [SPARK-41010][CONNECT][PYTHON] Complete Support for Except and Intersect in Python client

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38506: URL: https://github.com/apache/spark/pull/38506#discussion_r1018716212 ## python/pyspark/sql/connect/plan.py: ## @@ -642,7 +664,7 @@ def _repr_html_(self) -> str: return f""" -Union +

[GitHub] [spark] amaliujia commented on pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
amaliujia commented on PR #38468: URL: https://github.com/apache/spark/pull/38468#issuecomment-1309851369 Thanks for the most recent updates. I find it will help other problems. For example now we at least send one partition with schema even all partitions are empty. By doing so, clients

[GitHub] [spark] amaliujia commented on a diff in pull request #38546: [SPARK-41036][CONNECT][PYTHON] `columns` API should use `schema` API to avoid data fetching

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38546: URL: https://github.com/apache/spark/pull/38546#discussion_r1018712010 ## python/pyspark/sql/connect/dataframe.py: ## @@ -139,11 +139,9 @@ def columns(self) -> List[str]: if self._plan is None: return []

[GitHub] [spark] amaliujia commented on a diff in pull request #38546: [SPARK-41036][CONNECT][PYTHON] `columns` API should use `schema` API to avoid data fetching

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38546: URL: https://github.com/apache/spark/pull/38546#discussion_r1018710135 ## python/pyspark/sql/connect/dataframe.py: ## @@ -139,11 +139,9 @@ def columns(self) -> List[str]: if self._plan is None: return []

[GitHub] [spark] LuciferYang commented on pull request #38589: [SPARK-41087][BUILD] Remove duplicate `-Xmx4g` from `dev/make-distribution.sh` and make `build/mvn` use the same JAVA_OPTS

2022-11-09 Thread GitBox
LuciferYang commented on PR #38589: URL: https://github.com/apache/spark/pull/38589#issuecomment-1309849459 done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] amaliujia commented on a diff in pull request #38546: [SPARK-41036][CONNECT][PYTHON] `columns` API should use `schema` API to avoid data fetching

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38546: URL: https://github.com/apache/spark/pull/38546#discussion_r1018710135 ## python/pyspark/sql/connect/dataframe.py: ## @@ -139,11 +139,9 @@ def columns(self) -> List[str]: if self._plan is None: return []

[GitHub] [spark] amaliujia commented on a diff in pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38586: URL: https://github.com/apache/spark/pull/38586#discussion_r1018709403 ## python/pyspark/sql/connect/column.py: ## @@ -30,8 +30,8 @@ def _bin_op( name: str, doc: str = "binary function", reverse: bool = False -) ->

[GitHub] [spark] LuciferYang commented on a diff in pull request #38589: [SPARK-41087][BUILD] Make `build/mvn` use the same JAVA_OPTS as `dev/make-distribution.sh`

2022-11-09 Thread GitBox
LuciferYang commented on code in PR #38589: URL: https://github.com/apache/spark/pull/38589#discussion_r1018700848 ## build/mvn: ## @@ -36,7 +36,7 @@ _DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" # Preserve the calling directory _CALLING_DIR="$(pwd)" # Options

[GitHub] [spark] HyukjinKwon closed pull request #38541: [SPARK-41034][CONNECT][PYTHON] Connect DataFrame should require a RemoteSparkSession

2022-11-09 Thread GitBox
HyukjinKwon closed pull request #38541: [SPARK-41034][CONNECT][PYTHON] Connect DataFrame should require a RemoteSparkSession URL: https://github.com/apache/spark/pull/38541 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on pull request #38589: [SPARK-41087][BUILD] Make `build/mvn` use the same JAVA_OPTS as `dev/make-distribution.sh`

2022-11-09 Thread GitBox
dongjoon-hyun commented on PR #38589: URL: https://github.com/apache/spark/pull/38589#issuecomment-1309846227 Could you revise the PR title according to the new change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #38541: [SPARK-41034][CONNECT][PYTHON] Connect DataFrame should require a RemoteSparkSession

2022-11-09 Thread GitBox
HyukjinKwon commented on PR #38541: URL: https://github.com/apache/spark/pull/38541#issuecomment-1309846031 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38546: [SPARK-41036][CONNECT][PYTHON] `columns` API should use `schema` API to avoid data fetching

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38546: URL: https://github.com/apache/spark/pull/38546#discussion_r1018707453 ## python/pyspark/sql/connect/dataframe.py: ## @@ -139,11 +139,9 @@ def columns(self) -> List[str]: if self._plan is None: return []

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38546: [SPARK-41036][CONNECT][PYTHON] `columns` API should use `schema` API to avoid data fetching

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38546: URL: https://github.com/apache/spark/pull/38546#discussion_r1018707212 ## python/pyspark/sql/connect/dataframe.py: ## @@ -139,11 +139,9 @@ def columns(self) -> List[str]: if self._plan is None: return []

[GitHub] [spark] dongjoon-hyun closed pull request #38585: [SPARK-41076][BUILD][CONNECT] Upgrade `protobuf` to 3.21.9

2022-11-09 Thread GitBox
dongjoon-hyun closed pull request #38585: [SPARK-41076][BUILD][CONNECT] Upgrade `protobuf` to 3.21.9 URL: https://github.com/apache/spark/pull/38585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] amaliujia commented on a diff in pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38586: URL: https://github.com/apache/spark/pull/38586#discussion_r1018685631 ## python/pyspark/sql/connect/column.py: ## @@ -30,8 +30,8 @@ def _bin_op( name: str, doc: str = "binary function", reverse: bool = False -) ->

[GitHub] [spark] amaliujia commented on a diff in pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38586: URL: https://github.com/apache/spark/pull/38586#discussion_r1018706220 ## python/pyspark/sql/connect/column.py: ## @@ -30,8 +30,8 @@ def _bin_op( name: str, doc: str = "binary function", reverse: bool = False -) ->

[GitHub] [spark] amaliujia commented on a diff in pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38586: URL: https://github.com/apache/spark/pull/38586#discussion_r1018704727 ## python/pyspark/sql/connect/column.py: ## @@ -30,8 +30,8 @@ def _bin_op( name: str, doc: str = "binary function", reverse: bool = False -) ->

[GitHub] [spark] amaliujia commented on a diff in pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38586: URL: https://github.com/apache/spark/pull/38586#discussion_r1018704727 ## python/pyspark/sql/connect/column.py: ## @@ -30,8 +30,8 @@ def _bin_op( name: str, doc: str = "binary function", reverse: bool = False -) ->

[GitHub] [spark] amaliujia commented on a diff in pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38586: URL: https://github.com/apache/spark/pull/38586#discussion_r1018704727 ## python/pyspark/sql/connect/column.py: ## @@ -30,8 +30,8 @@ def _bin_op( name: str, doc: str = "binary function", reverse: bool = False -) ->

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38579: [SPARK-40877][DOC][FOLLOW-UP] Update the doc of `DataFrame.stat.crosstab `

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38579: URL: https://github.com/apache/spark/pull/38579#discussion_r1018705370 ## python/pyspark/sql/dataframe.py: ## @@ -4217,15 +4217,18 @@ def cov(self, col1: str, col2: str) -> float: def crosstab(self, col1: str, col2: str) ->

[GitHub] [spark] amaliujia commented on a diff in pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38586: URL: https://github.com/apache/spark/pull/38586#discussion_r1018704727 ## python/pyspark/sql/connect/column.py: ## @@ -30,8 +30,8 @@ def _bin_op( name: str, doc: str = "binary function", reverse: bool = False -) ->

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38586: URL: https://github.com/apache/spark/pull/38586#discussion_r1018703831 ## python/pyspark/sql/connect/column.py: ## @@ -30,8 +30,8 @@ def _bin_op( name: str, doc: str = "binary function", reverse: bool = False -) ->

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38586: URL: https://github.com/apache/spark/pull/38586#discussion_r1018703689 ## python/pyspark/sql/connect/column.py: ## @@ -30,8 +30,8 @@ def _bin_op( name: str, doc: str = "binary function", reverse: bool = False -) ->

[GitHub] [spark] LuciferYang commented on a diff in pull request #38589: [SPARK-41087][BUILD] Make `build/mvn` use the same JAVA_OPTS as `dev/make-distribution.sh`

2022-11-09 Thread GitBox
LuciferYang commented on code in PR #38589: URL: https://github.com/apache/spark/pull/38589#discussion_r1018700848 ## build/mvn: ## @@ -36,7 +36,7 @@ _DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" # Preserve the calling directory _CALLING_DIR="$(pwd)" # Options

[GitHub] [spark] amaliujia commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018700467 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -197,6 +197,17 @@ def test_range(self): .equals(self.spark.range(start=0, end=10,

[GitHub] [spark] amaliujia commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r101873 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] amaliujia commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018699466 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] grundprinzip commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
grundprinzip commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018699081 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] amaliujia commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018699060 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] amaliujia commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018697387 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark-docker] Yikun commented on pull request #23: [SPARK-40519] Add "Publish" workflow to help release apache/spark image

2022-11-09 Thread GitBox
Yikun commented on PR #23: URL: https://github.com/apache/spark-docker/pull/23#issuecomment-1309832054 After some considerations, we'd better only expose the `Spark version` and `Publish repo`, and move inputs.scala/java in to matrix, because: 1. It's simpler, the release manager just

[GitHub] [spark] pan3793 commented on a diff in pull request #38596: [SPARK-41093][DEPS] Remove netty-tcnative-classes from Spark dependencyList

2022-11-09 Thread GitBox
pan3793 commented on code in PR #38596: URL: https://github.com/apache/spark/pull/38596#discussion_r1018684836 ## pom.xml: ## @@ -928,16 +924,6 @@ ${netty.version} osx-x86_64 -

[GitHub] [spark] LuciferYang commented on a diff in pull request #38589: [SPARK-41087][BUILD] Make `build/mvn` use the same JAVA_OPTS as `dev/make-distribution.sh`

2022-11-09 Thread GitBox
LuciferYang commented on code in PR #38589: URL: https://github.com/apache/spark/pull/38589#discussion_r1018691938 ## build/mvn: ## @@ -36,7 +36,7 @@ _DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" # Preserve the calling directory _CALLING_DIR="$(pwd)" # Options

[GitHub] [spark] HyukjinKwon commented on pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
HyukjinKwon commented on PR #38468: URL: https://github.com/apache/spark/pull/38468#issuecomment-1309825071 Logic-wise, makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on a diff in pull request #38589: [SPARK-41087][BUILD] Make `build/mvn` use the same JAVA_OPTS as `dev/make-distribution.sh`

2022-11-09 Thread GitBox
LuciferYang commented on code in PR #38589: URL: https://github.com/apache/spark/pull/38589#discussion_r1018690310 ## build/mvn: ## @@ -36,7 +36,7 @@ _DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" # Preserve the calling directory _CALLING_DIR="$(pwd)" # Options

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018690469 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018689309 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] LuciferYang commented on pull request #38589: [SPARK-41087][BUILD] Make `build/mvn` use the same JAVA_OPTS as `dev/make-distribution.sh`

2022-11-09 Thread GitBox
LuciferYang commented on PR #38589: URL: https://github.com/apache/spark/pull/38589#issuecomment-1309822675 Wait me check maven test all modules -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38589: [SPARK-41087][BUILD] Make `build/mvn` use the same JAVA_OPTS as `dev/make-distribution.sh`

2022-11-09 Thread GitBox
dongjoon-hyun commented on code in PR #38589: URL: https://github.com/apache/spark/pull/38589#discussion_r1018688674 ## build/mvn: ## @@ -36,7 +36,7 @@ _DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" # Preserve the calling directory _CALLING_DIR="$(pwd)" # Options

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018688539 ## python/pyspark/sql/connect/client.py: ## @@ -400,6 +400,14 @@ def _execute_and_fetch(self, req: pb2.Request) -> typing.Optional[pandas.DataFra if

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018688345 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -197,6 +197,17 @@ def test_range(self): .equals(self.spark.range(start=0, end=10,

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018687996 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,97 @@ private[sql] object ArrowConverters extends Logging

[GitHub] [spark] LuciferYang commented on pull request #38589: [SPARK-41087][BUILD] Make `build/mvn` use the same JAVA_OPTS as `dev/make-distribution.sh`

2022-11-09 Thread GitBox
LuciferYang commented on PR #38589: URL: https://github.com/apache/spark/pull/38589#issuecomment-1309821310 cc @dongjoon-hyun is this one ok? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] pan3793 commented on a diff in pull request #38596: [SPARK-41093][DEPS] Remove netty-tcnative-classes from Spark dependencyList

2022-11-09 Thread GitBox
pan3793 commented on code in PR #38596: URL: https://github.com/apache/spark/pull/38596#discussion_r1018684836 ## pom.xml: ## @@ -928,16 +924,6 @@ ${netty.version} osx-x86_64 -

[GitHub] [spark] amaliujia commented on a diff in pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38586: URL: https://github.com/apache/spark/pull/38586#discussion_r1018686188 ## python/pyspark/sql/connect/column.py: ## @@ -30,8 +30,8 @@ def _bin_op( name: str, doc: str = "binary function", reverse: bool = False -) ->

[GitHub] [spark] pan3793 commented on a diff in pull request #38596: [SPARK-41093][DEPS] Remove netty-tcnative-classes from Spark dependencyList

2022-11-09 Thread GitBox
pan3793 commented on code in PR #38596: URL: https://github.com/apache/spark/pull/38596#discussion_r1018684836 ## pom.xml: ## @@ -928,16 +924,6 @@ ${netty.version} osx-x86_64 -

[GitHub] [spark] amaliujia commented on a diff in pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38586: URL: https://github.com/apache/spark/pull/38586#discussion_r1018686188 ## python/pyspark/sql/connect/column.py: ## @@ -30,8 +30,8 @@ def _bin_op( name: str, doc: str = "binary function", reverse: bool = False -) ->

[GitHub] [spark] amaliujia commented on a diff in pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-09 Thread GitBox
amaliujia commented on code in PR #38586: URL: https://github.com/apache/spark/pull/38586#discussion_r1018685631 ## python/pyspark/sql/connect/column.py: ## @@ -30,8 +30,8 @@ def _bin_op( name: str, doc: str = "binary function", reverse: bool = False -) ->

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018685534 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -83,7 +83,6 @@ message Response { int64 uncompressed_bytes = 2; Review Comment:

[GitHub] [spark] pan3793 commented on a diff in pull request #38596: [SPARK-41093][DEPS] Remove netty-tcnative-classes from Spark dependencyList

2022-11-09 Thread GitBox
pan3793 commented on code in PR #38596: URL: https://github.com/apache/spark/pull/38596#discussion_r1018684836 ## pom.xml: ## @@ -928,16 +924,6 @@ ${netty.version} osx-x86_64 -

[GitHub] [spark] AngersZhuuuu commented on pull request #38571: [SPARK-37555][TEST][FOLLOWUP] Increase timeout of CLI test `spark-sql should pass last unclosed comment to backend`

2022-11-09 Thread GitBox
AngersZh commented on PR #38571: URL: https://github.com/apache/spark/pull/38571#issuecomment-1309816631 > flaky test? yea -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018684111 ## python/pyspark/sql/connect/client.py: ## @@ -400,6 +400,14 @@ def _execute_and_fetch(self, req: pb2.Request) -> typing.Optional[pandas.DataFra if

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018683751 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018682801 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018682607 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018682109 ## python/pyspark/sql/connect/client.py: ## @@ -400,6 +400,14 @@ def _execute_and_fetch(self, req: pb2.Request) -> typing.Optional[pandas.DataFra if

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018681767 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018681437 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -48,19 +51,25 @@ class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018681018 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] pan3793 opened a new pull request, #38596: [SPARK-41093][DEPS] Remove netty-tcnative-classes from Spark dependencyList

2022-11-09 Thread GitBox
pan3793 opened a new pull request, #38596: URL: https://github.com/apache/spark/pull/38596 ### What changes were proposed in this pull request? Remove `netty-tcnative-classes` from Spark dependencyList. ### Why are the changes needed?

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018680299 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018679792 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018678853 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -48,19 +51,25 @@ class

[GitHub] [spark] LuciferYang closed pull request #38590: [SPARK-40767][BUILD][3.3] Fix Java opts to to improve maven compilation speed

2022-11-09 Thread GitBox
LuciferYang closed pull request #38590: [SPARK-40767][BUILD][3.3] Fix Java opts to to improve maven compilation speed URL: https://github.com/apache/spark/pull/38590 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan closed pull request #38557: [SPARK-38959][SQL][FOLLOWUP] Optimizer batch `PartitionPruning` should optimize subqueries

2022-11-09 Thread GitBox
cloud-fan closed pull request #38557: [SPARK-38959][SQL][FOLLOWUP] Optimizer batch `PartitionPruning` should optimize subqueries URL: https://github.com/apache/spark/pull/38557 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on pull request #38557: [SPARK-38959][SQL][FOLLOWUP] Optimizer batch `PartitionPruning` should optimize subqueries

2022-11-09 Thread GitBox
cloud-fan commented on PR #38557: URL: https://github.com/apache/spark/pull/38557#issuecomment-1309806471 thanks for review, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] MaxGekk commented on a diff in pull request #38583: [SPARK-41092][SQL] Do not use identifier to match interval units

2022-11-09 Thread GitBox
MaxGekk commented on code in PR #38583: URL: https://github.com/apache/spark/pull/38583#discussion_r1018674809 ## docs/sql-ref-ansi-compliance.md: ## @@ -407,6 +407,7 @@ Below is a list of all the keywords in Spark SQL. |DATEADD|non-reserved|non-reserved|non-reserved|

[GitHub] [spark] LuciferYang commented on pull request #38593: [SPARK-41089][YARN][SHUFFLE] Relocate Netty native arm64 libs

2022-11-09 Thread GitBox
LuciferYang commented on PR #38593: URL: https://github.com/apache/spark/pull/38593#issuecomment-1309805706 > `libnetty_transport_native_epoll_aarch_64.so` is for linux OK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] pan3793 commented on pull request #38593: [SPARK-41089][YARN][SHUFFLE] Relocate Netty native arm64 libs

2022-11-09 Thread GitBox
pan3793 commented on PR #38593: URL: https://github.com/apache/spark/pull/38593#issuecomment-1309804304 `libnetty_transport_native_epoll_aarch_64.so` is for linux -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang commented on pull request #38593: [SPARK-41089][YARN][SHUFFLE] Relocate Netty native arm64 libs

2022-11-09 Thread GitBox
LuciferYang commented on PR #38593: URL: https://github.com/apache/spark/pull/38593#issuecomment-1309803806 I think 3.3 is enough, [support of Apple Silicon](https://issues.apache.org/jira/browse/SPARK-35781) was announced in release notes in 3.3:

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-09 Thread GitBox
HyukjinKwon commented on code in PR #38586: URL: https://github.com/apache/spark/pull/38586#discussion_r1018671588 ## python/pyspark/sql/connect/column.py: ## @@ -30,8 +30,8 @@ def _bin_op( name: str, doc: str = "binary function", reverse: bool = False -) ->

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-09 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1018671432 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,602 @@ def array_to_vector(col: Column) -> Column: return

  1   2   3   >