[GitHub] [spark] jzhuge commented on pull request #38699: [SPARK-41188][CORE][ML] Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes

2023-02-26 Thread via GitHub
jzhuge commented on PR #38699: URL: https://github.com/apache/spark/pull/38699#issuecomment-1445858855 > > If we are setting it in `SparkContext`, do we want to get rid of this from other places like `PythonRunner.compute` ? > > I think we can remove code in PythonRunner.compute

[GitHub] [spark] WweiL opened a new pull request, #40187: [SPARK-42572] [SQL] [SS] Fix behavior for StateStoreProvider.validateStateRowFormat

2023-02-26 Thread via GitHub
WweiL opened a new pull request, #40187: URL: https://github.com/apache/spark/pull/40187 ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/40073 accidentally changed the relationship of the two `if` statement in

[GitHub] [spark] cloud-fan closed pull request #40121: [SPARK-42528][CORE] Optimize PercentileHeap

2023-02-26 Thread via GitHub
cloud-fan closed pull request #40121: [SPARK-42528][CORE] Optimize PercentileHeap URL: https://github.com/apache/spark/pull/40121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #40121: [SPARK-42528][CORE] Optimize PercentileHeap

2023-02-26 Thread via GitHub
cloud-fan commented on PR #40121: URL: https://github.com/apache/spark/pull/40121#issuecomment-1445779613 The failed HealthTrackerIntegrationSuite is definitely unrelated, I'm merging it to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cloud-fan commented on pull request #40115: [SPARK-42525][SQL] Collapse two adjacent windows with the same partition/order in subquery

2023-02-26 Thread via GitHub
cloud-fan commented on PR #40115: URL: https://github.com/apache/spark/pull/40115#issuecomment-1445778387 the change LGTM but the PR title is a bit confusing. How is it related to subquery? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] dongjoon-hyun commented on pull request #40183: [SPARK-42587][CONNECT][TESTS][FOLLOWUP] Fix `scalafmt` failure

2023-02-26 Thread via GitHub
dongjoon-hyun commented on PR #40183: URL: https://github.com/apache/spark/pull/40183#issuecomment-1445745535 Thank you, @viirya . Sorry for missing at the first PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] viirya commented on pull request #40183: [SPARK-42587][CONNECT][TESTS][FOLLOWUP] Fix `scalafmt` failure

2023-02-26 Thread via GitHub
viirya commented on PR #40183: URL: https://github.com/apache/spark/pull/40183#issuecomment-1445724214 Looks good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] allisonwang-db closed pull request #40146: [SPARK-42120][SQL] Add built-in table-valued function json_tuple

2023-02-26 Thread via GitHub
allisonwang-db closed pull request #40146: [SPARK-42120][SQL] Add built-in table-valued function json_tuple URL: https://github.com/apache/spark/pull/40146 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] allisonwang-db commented on pull request #40146: [SPARK-42120][SQL] Add built-in table-valued function json_tuple

2023-02-26 Thread via GitHub
allisonwang-db commented on PR #40146: URL: https://github.com/apache/spark/pull/40146#issuecomment-1445719261 Combined in https://github.com/apache/spark/pull/40151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] allisonwang-db closed pull request #40149: [SPARK-42122][SQL] Add built-in table-valued function stack

2023-02-26 Thread via GitHub
allisonwang-db closed pull request #40149: [SPARK-42122][SQL] Add built-in table-valued function stack URL: https://github.com/apache/spark/pull/40149 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] allisonwang-db commented on pull request #40149: [SPARK-42122][SQL] Add built-in table-valued function stack

2023-02-26 Thread via GitHub
allisonwang-db commented on PR #40149: URL: https://github.com/apache/spark/pull/40149#issuecomment-1445718970 Merged in https://github.com/apache/spark/pull/40151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

2023-02-26 Thread via GitHub
navinvishy commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1118279709 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1399,6 +1399,119 @@ case class ArrayContains(left:

[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

2023-02-26 Thread via GitHub
navinvishy commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1118278674 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1399,6 +1399,149 @@ case class ArrayContains(left:

[GitHub] [spark] hvanhovell opened a new pull request, #40186: [SPARK-42581][CONNECT] Add SQLImplicits.

2023-02-26 Thread via GitHub
hvanhovell opened a new pull request, #40186: URL: https://github.com/apache/spark/pull/40186 ### What changes were proposed in this pull request? This PR adds the `SQLImplicits` class to Spark Connect. This makes it easier for end users to work with Connect Datasets. The current

[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

2023-02-26 Thread via GitHub
navinvishy commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1118278902 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1399,6 +1399,149 @@ case class ArrayContains(left:

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40179: [SPARK-42560][CONNECT] Add ColumnName class

2023-02-26 Thread via GitHub
dongjoon-hyun commented on code in PR #40179: URL: https://github.com/apache/spark/pull/40179#discussion_r1118276085 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CompatibilitySuite.scala: ## @@ -155,6 +156,7 @@ class CompatibilitySuite

[GitHub] [spark] hvanhovell commented on pull request #40184: [SPARK-42569][CONNECT] Throw exceptions for unsupported session API

2023-02-26 Thread via GitHub
hvanhovell commented on PR #40184: URL: https://github.com/apache/spark/pull/40184#issuecomment-1445682796 @amaliujia can you please update the compatibility test for these? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] hvanhovell commented on a diff in pull request #40184: [SPARK-42569][CONNECT] Throw exceptions for unsupported session API

2023-02-26 Thread via GitHub
hvanhovell commented on code in PR #40184: URL: https://github.com/apache/spark/pull/40184#discussion_r1118271294 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -190,6 +190,22 @@ class SparkSession( range(start, end, step,

[GitHub] [spark] hvanhovell commented on a diff in pull request #40184: [SPARK-42569][CONNECT] Throw exceptions for unsupported session API

2023-02-26 Thread via GitHub
hvanhovell commented on code in PR #40184: URL: https://github.com/apache/spark/pull/40184#discussion_r1118271144 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -190,6 +190,22 @@ class SparkSession( range(start, end, step,

[GitHub] [spark] hvanhovell opened a new pull request, #40185: [SPARK-42586][CONNECT] Add RuntimeConfig for Scala Client

2023-02-26 Thread via GitHub
hvanhovell opened a new pull request, #40185: URL: https://github.com/apache/spark/pull/40185 ### What changes were proposed in this pull request? This PR adds the RuntimeConfig class for the Spark Connect Scala Client. ### Why are the changes needed? API Parity. ### Does

[GitHub] [spark] gatorsmile commented on pull request #39558: [SPARK-41982][SQL] Partitions of type string should not be treated as numeric types

2023-02-26 Thread via GitHub
gatorsmile commented on PR #39558: URL: https://github.com/apache/spark/pull/39558#issuecomment-1445674657 @smallzhongfeng Could you help add it to the migration guide? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] beliefer commented on pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

2023-02-26 Thread via GitHub
beliefer commented on PR #39990: URL: https://github.com/apache/spark/pull/39990#issuecomment-1445670336 ping @huaxingao cc @cloud-fan @sadikovi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] amaliujia commented on pull request #40184: [SPARK-42569][CONNECT] Throw exceptions for unsupported session API

2023-02-26 Thread via GitHub
amaliujia commented on PR #40184: URL: https://github.com/apache/spark/pull/40184#issuecomment-1445656241 @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] amaliujia opened a new pull request, #40184: [SPARK-42569][CONNECT] Throw exceptions for unsupported session API

2023-02-26 Thread via GitHub
amaliujia opened a new pull request, #40184: URL: https://github.com/apache/spark/pull/40184 ### What changes were proposed in this pull request? Throw exceptions for unsupported session API: 1. newSession 2. getActiveSession 3. getDefaultSession 4. active

[GitHub] [spark] dongjoon-hyun closed pull request #40183: [SPARK-42587][CONNECT][TESTS][FOLLOWUP] Fix `scalafmt` failure

2023-02-26 Thread via GitHub
dongjoon-hyun closed pull request #40183: [SPARK-42587][CONNECT][TESTS][FOLLOWUP] Fix `scalafmt` failure URL: https://github.com/apache/spark/pull/40183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #40183: [SPARK-42587][CONNECT][TESTS][FOLLOWUP] Fix `scalafmt` failure

2023-02-26 Thread via GitHub
dongjoon-hyun commented on PR #40183: URL: https://github.com/apache/spark/pull/40183#issuecomment-1445652117 Merged to master/3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #40183: [SPARK-42587][CONNECT][TESTS][FOLLOWUP] Fix `scalafmt` failure

2023-02-26 Thread via GitHub
dongjoon-hyun commented on PR #40183: URL: https://github.com/apache/spark/pull/40183#issuecomment-1445651565 Thank you so much, @hvanhovell . Sorry for the troubles. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on pull request #40183: [SPARK-42587][CONNECT][TESTS][FOLLOWUP] Fix `scalafmt` failure

2023-02-26 Thread via GitHub
dongjoon-hyun commented on PR #40183: URL: https://github.com/apache/spark/pull/40183#issuecomment-1445636922 Now, it passed. ![Screenshot 2023-02-26 at 7 25 33 PM](https://user-images.githubusercontent.com/9700541/221465853-833cb047-d751-43f1-a341-7a6b01f5ce21.png) -- This is an

[GitHub] [spark] dongjoon-hyun commented on pull request #40183: [SPARK-42587][CONNECT][TESTS][FOLLOWUP] Fix `scalafmt` failure

2023-02-26 Thread via GitHub
dongjoon-hyun commented on PR #40183: URL: https://github.com/apache/spark/pull/40183#issuecomment-1445628852 cc @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] dongjoon-hyun opened a new pull request, #40183: [SPARK-42587][CONNECT][TESTS][FOLLOWUP] Fix `scalafmt` failure

2023-02-26 Thread via GitHub
dongjoon-hyun opened a new pull request, #40183: URL: https://github.com/apache/spark/pull/40183 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] dongjoon-hyun commented on pull request #40181: [SPARK-42589][CONNECT][TESTS] Exclude `RelationalGroupedDataset.apply` from `CompatibilitySuite`

2023-02-26 Thread via GitHub
dongjoon-hyun commented on PR #40181: URL: https://github.com/apache/spark/pull/40181#issuecomment-1445625931 Let me close this and fix the branch first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun closed pull request #40181: [SPARK-42589][CONNECT][TESTS] Exclude `RelationalGroupedDataset.apply` from `CompatibilitySuite`

2023-02-26 Thread via GitHub
dongjoon-hyun closed pull request #40181: [SPARK-42589][CONNECT][TESTS] Exclude `RelationalGroupedDataset.apply` from `CompatibilitySuite` URL: https://github.com/apache/spark/pull/40181 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40181: [SPARK-42589][CONNECT][TESTS] Exclude `RelationalGroupedDataset.apply` from `CompatibilitySuite`

2023-02-26 Thread via GitHub
dongjoon-hyun commented on code in PR #40181: URL: https://github.com/apache/spark/pull/40181#discussion_r1118237094 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CompatibilitySuite.scala: ## @@ -39,8 +39,8 @@ import

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40181: [SPARK-42589][CONNECT][TESTS] Exclude `RelationalGroupedDataset.apply` from `CompatibilitySuite`

2023-02-26 Thread via GitHub
dongjoon-hyun commented on code in PR #40181: URL: https://github.com/apache/spark/pull/40181#discussion_r1118236801 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CompatibilitySuite.scala: ## @@ -159,6 +159,7 @@ class CompatibilitySuite

[GitHub] [spark] dongjoon-hyun commented on pull request #40181: [SPARK-42589][CONNECT][TESTS] Exclude `RelationalGroupedDataset.apply` from `CompatibilitySuite`

2023-02-26 Thread via GitHub
dongjoon-hyun commented on PR #40181: URL: https://github.com/apache/spark/pull/40181#issuecomment-1445615890 Hi, @viirya . Could you review this PR too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] zml1206 opened a new pull request, #40182: [SPARK-42588][SQL] Collapse two adjacent windows with the equivalent partition/order expressions

2023-02-26 Thread via GitHub
zml1206 opened a new pull request, #40182: URL: https://github.com/apache/spark/pull/40182 ### What changes were proposed in this pull request? Extend the CollapseWindow rule to collapse Window nodes with the equivalent partition/order expressions ### Why are the changes

[GitHub] [spark] dongjoon-hyun opened a new pull request, #40181: [SPARK-42589][CONNECT][TESTS] Exclude `RelationalGroupedDataset.apply` from `CompatibilitySuite`

2023-02-26 Thread via GitHub
dongjoon-hyun opened a new pull request, #40181: URL: https://github.com/apache/spark/pull/40181 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] hvanhovell closed pull request #40179: [SPARK-42560][CONNECT] Add ColumnName class

2023-02-26 Thread via GitHub
hvanhovell closed pull request #40179: [SPARK-42560][CONNECT] Add ColumnName class URL: https://github.com/apache/spark/pull/40179 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell commented on pull request #40179: [SPARK-42560][CONNECT] Add ColumnName class

2023-02-26 Thread via GitHub
hvanhovell commented on PR #40179: URL: https://github.com/apache/spark/pull/40179#issuecomment-1445591135 Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun closed pull request #40180: [SPARK-42587][CONNECT][TESTS] Use wrapper versions for SBT and Maven in `connect` module tests

2023-02-26 Thread via GitHub
dongjoon-hyun closed pull request #40180: [SPARK-42587][CONNECT][TESTS] Use wrapper versions for SBT and Maven in `connect` module tests URL: https://github.com/apache/spark/pull/40180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #40180: [SPARK-42587][CONNECT][TESTS] Use wrapper versions for SBT and Maven in `connect` module tests

2023-02-26 Thread via GitHub
dongjoon-hyun commented on PR #40180: URL: https://github.com/apache/spark/pull/40180#issuecomment-1445587270 Thank you! Merged to master/3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #40176: [SPARK-42564][CONNECT] Implement SparkSession.version and SparkSession.time

2023-02-26 Thread via GitHub
dongjoon-hyun commented on PR #40176: URL: https://github.com/apache/spark/pull/40176#issuecomment-1445586703 Thank you for doing this too! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wankunde commented on pull request #40157: [SPARK-42551][SQL] Support subexpression elimination in FilterExec

2023-02-26 Thread via GitHub
wankunde commented on PR #40157: URL: https://github.com/apache/spark/pull/40157#issuecomment-1445583743 cc @cloud-fan Could you help to review this PR? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-26 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1118213407 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: > On the other side, maven is

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-26 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1118213407 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: > On the other side, maven is

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-26 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1118209034 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: > Do we still need rules for

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-26 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1118212051 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: Update comments -- This is

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40013: [SPARK-42367][CONNECT][PYTHON] `DataFrame.drop` should handle duplicated columns properly

2023-02-26 Thread via GitHub
zhengruifeng commented on code in PR #40013: URL: https://github.com/apache/spark/pull/40013#discussion_r1118211205 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1346,16 +1346,16 @@ class

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-26 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1118209034 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: > Do we still need rules for

[GitHub] [spark] dongjoon-hyun commented on pull request #40180: [SPARK-42587][CONNECT][TESTS] Use wrapper versions for SBT and Maven in `connect` module tests

2023-02-26 Thread via GitHub
dongjoon-hyun commented on PR #40180: URL: https://github.com/apache/spark/pull/40180#issuecomment-1445559302 Could you review this editorial patch, @HyukjinKwon and @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun opened a new pull request, #40180: [SPARK-42587][CONNECT][TESTS] Use wrapper versions for SBT and Maven in `connect` module tests

2023-02-26 Thread via GitHub
dongjoon-hyun opened a new pull request, #40180: URL: https://github.com/apache/spark/pull/40180 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] ulysses-you commented on a diff in pull request #40177: [SPARK-42583][SQL] Remove the outer join if they are all distinct aggregate functions

2023-02-26 Thread via GitHub
ulysses-you commented on code in PR #40177: URL: https://github.com/apache/spark/pull/40177#discussion_r1118205294 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala: ## @@ -194,12 +194,12 @@ object EliminateOuterJoin extends Rule[LogicalPlan]

[GitHub] [spark] ulysses-you commented on a diff in pull request #40177: [SPARK-42583][SQL] Remove the outer join if they are all distinct aggregate functions

2023-02-26 Thread via GitHub
ulysses-you commented on code in PR #40177: URL: https://github.com/apache/spark/pull/40177#discussion_r1118205294 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala: ## @@ -194,12 +194,12 @@ object EliminateOuterJoin extends Rule[LogicalPlan]

[GitHub] [spark] viirya commented on a diff in pull request #40178: [MINOR][DOCS][FOLLOWUP] Remove `Jenkins` from web page.

2023-02-26 Thread via GitHub
viirya commented on code in PR #40178: URL: https://github.com/apache/spark/pull/40178#discussion_r1118205272 ## docs/building-spark.md: ## @@ -276,34 +276,6 @@ Enable the profile (e.g. 2.13): # For sbt ./build/sbt -Pscala-2.13 compile -## Running Jenkins tests with

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40013: [SPARK-42367][CONNECT][PYTHON] `DataFrame.drop` should handle duplicated columns properly

2023-02-26 Thread via GitHub
zhengruifeng commented on code in PR #40013: URL: https://github.com/apache/spark/pull/40013#discussion_r1118194094 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1346,16 +1346,16 @@ class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40178: [MINOR][DOCS][FOLLOWUP] Remove `Jenkins` from web page.

2023-02-26 Thread via GitHub
HyukjinKwon commented on code in PR #40178: URL: https://github.com/apache/spark/pull/40178#discussion_r1118182170 ## docs/building-spark.md: ## @@ -276,34 +276,6 @@ Enable the profile (e.g. 2.13): # For sbt ./build/sbt -Pscala-2.13 compile -## Running Jenkins tests

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40013: [SPARK-42367][CONNECT][PYTHON] `DataFrame.drop` should handle duplicated columns properly

2023-02-26 Thread via GitHub
zhengruifeng commented on code in PR #40013: URL: https://github.com/apache/spark/pull/40013#discussion_r1118191339 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala: ## @@ -703,21 +703,13 @@ package object dsl { def drop(columns:

[GitHub] [spark] zhengruifeng commented on pull request #40170: [SPARK-42574][CONNECT][PYTHON] Fix toPandas to handle duplicated column names

2023-02-26 Thread via GitHub
zhengruifeng commented on PR #40170: URL: https://github.com/apache/spark/pull/40170#issuecomment-1445528865 late LGTM, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #39995: [WIP][CONNECT] Initial runtime SQL configuration implementation

2023-02-26 Thread via GitHub
zhengruifeng closed pull request #39995: [WIP][CONNECT] Initial runtime SQL configuration implementation URL: https://github.com/apache/spark/pull/39995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] hvanhovell commented on pull request #40176: [SPARK-42564][CONNECT] Implement SparkSession.version and SparkSession.time

2023-02-26 Thread via GitHub
hvanhovell commented on PR #40176: URL: https://github.com/apache/spark/pull/40176#issuecomment-1445521369 Thanks for doing this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell closed pull request #40176: [SPARK-42564][CONNECT] Implement SparkSession.version and SparkSession.time

2023-02-26 Thread via GitHub
hvanhovell closed pull request #40176: [SPARK-42564][CONNECT] Implement SparkSession.version and SparkSession.time URL: https://github.com/apache/spark/pull/40176 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] hvanhovell commented on pull request #40176: [SPARK-42564][CONNECT] Implement SparkSession.version and SparkSession.time

2023-02-26 Thread via GitHub
hvanhovell commented on PR #40176: URL: https://github.com/apache/spark/pull/40176#issuecomment-1445521025 Merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] hvanhovell commented on a diff in pull request #40179: [SPARK-42560][CONNECT] Add ColumnName class

2023-02-26 Thread via GitHub
hvanhovell commented on code in PR #40179: URL: https://github.com/apache/spark/pull/40179#discussion_r1118185169 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CompatibilitySuite.scala: ## @@ -155,6 +156,7 @@ class CompatibilitySuite extends

[GitHub] [spark] HyukjinKwon closed pull request #39991: [SPARK-42419][CONNECT][PYTHON] Migrate into error framework for Spark Connect Column API.

2023-02-26 Thread via GitHub
HyukjinKwon closed pull request #39991: [SPARK-42419][CONNECT][PYTHON] Migrate into error framework for Spark Connect Column API. URL: https://github.com/apache/spark/pull/39991 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on pull request #39991: [SPARK-42419][CONNECT][PYTHON] Migrate into error framework for Spark Connect Column API.

2023-02-26 Thread via GitHub
HyukjinKwon commented on PR #39991: URL: https://github.com/apache/spark/pull/39991#issuecomment-1445518870 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #40172: [SPARK-42569][CONNECT][FOLLOW-UP] Throw unsupported exceptions for persist

2023-02-26 Thread via GitHub
HyukjinKwon closed pull request #40172: [SPARK-42569][CONNECT][FOLLOW-UP] Throw unsupported exceptions for persist URL: https://github.com/apache/spark/pull/40172 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #40172: [SPARK-42569][CONNECT][FOLLOW-UP] Throw unsupported exceptions for persist

2023-02-26 Thread via GitHub
HyukjinKwon commented on PR #40172: URL: https://github.com/apache/spark/pull/40172#issuecomment-1445518268 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40179: [SPARK-42560][CONNECT] Add ColumnName class

2023-02-26 Thread via GitHub
HyukjinKwon commented on code in PR #40179: URL: https://github.com/apache/spark/pull/40179#discussion_r1118183276 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -26,7 +26,7 @@ import

[GitHub] [spark] HyukjinKwon closed pull request #40170: [SPARK-42574][CONNECT][PYTHON] Fix toPandas to handle duplicated column names

2023-02-26 Thread via GitHub
HyukjinKwon closed pull request #40170: [SPARK-42574][CONNECT][PYTHON] Fix toPandas to handle duplicated column names URL: https://github.com/apache/spark/pull/40170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #40170: [SPARK-42574][CONNECT][PYTHON] Fix toPandas to handle duplicated column names

2023-02-26 Thread via GitHub
HyukjinKwon commented on PR #40170: URL: https://github.com/apache/spark/pull/40170#issuecomment-1445516304 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40178: [MINOR][DOCS][FOLLOWUP] Remove `Jenkins` from web page.

2023-02-26 Thread via GitHub
HyukjinKwon commented on code in PR #40178: URL: https://github.com/apache/spark/pull/40178#discussion_r1118182170 ## docs/building-spark.md: ## @@ -276,34 +276,6 @@ Enable the profile (e.g. 2.13): # For sbt ./build/sbt -Pscala-2.13 compile -## Running Jenkins tests

[GitHub] [spark] hvanhovell opened a new pull request, #40179: [SPARK-42560][CONNECT] Add ColumnName class

2023-02-26 Thread via GitHub
hvanhovell opened a new pull request, #40179: URL: https://github.com/apache/spark/pull/40179 ### What changes were proposed in this pull request? This PR adds the ColumnName for the Spark Connect Scala Client. This is a stepping stone to implement the SQLImplicits. ### Why are

[GitHub] [spark] dongjoon-hyun commented on pull request #40178: [MINOR][DOCS][FOLLOWUP] Remove `Jenkins` from web page.

2023-02-26 Thread via GitHub
dongjoon-hyun commented on PR #40178: URL: https://github.com/apache/spark/pull/40178#issuecomment-1445501633 Lastly, are you claiming a followup across `spark-website` and `spark` repositories? To me, `[FOLLOWUP]` doesn't make sense at all in that case, @bjornjorgensen . -- This is an

[GitHub] [spark] dongjoon-hyun commented on pull request #40178: [MINOR][DOCS][FOLLOWUP] Remove `Jenkins` from web page.

2023-02-26 Thread via GitHub
dongjoon-hyun commented on PR #40178: URL: https://github.com/apache/spark/pull/40178#issuecomment-1445501723 Also, cc @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #40178: [MINOR][DOCS][FOLLOWUP] Remove `Jenkins` from web page.

2023-02-26 Thread via GitHub
dongjoon-hyun commented on PR #40178: URL: https://github.com/apache/spark/pull/40178#issuecomment-1445501146 To be clear, the code change itself looks okay, @bjornjorgensen . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] wangyum commented on pull request #40177: [SPARK-42583][SQL] Remove the outer join if they are all distinct aggregate functions

2023-02-26 Thread via GitHub
wangyum commented on PR #40177: URL: https://github.com/apache/spark/pull/40177#issuecomment-1445494765 cc @cloud-fan @ulysses-you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] NarekDW commented on pull request #40040: [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)

2023-02-26 Thread via GitHub
NarekDW commented on PR #40040: URL: https://github.com/apache/spark/pull/40040#issuecomment-1445466115 Also, I'd like to share some performance measurements from my local machine, using JMH: code example: ```java ... @Benchmark public void

[GitHub] [spark] hvanhovell commented on pull request #40175: [SPARK-42580][CONNECT] Scala client add client side typed APIs

2023-02-26 Thread via GitHub
hvanhovell commented on PR #40175: URL: https://github.com/apache/spark/pull/40175#issuecomment-1445462573 @cloud-fan can you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] bjornjorgensen commented on pull request #40178: [MINOR][DOCS][FOLLOWUP] Remove `Jenkins` from web page.

2023-02-26 Thread via GitHub
bjornjorgensen commented on PR #40178: URL: https://github.com/apache/spark/pull/40178#issuecomment-1445419415 And CC @xinrong-meng This is for updating documentation for spark 3.4 release. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] bjornjorgensen commented on pull request #40178: [MINOR][DOCS][FOLLOWUP] Remove `Jenkins` from web page.

2023-02-26 Thread via GitHub
bjornjorgensen commented on PR #40178: URL: https://github.com/apache/spark/pull/40178#issuecomment-1445418994 @srowen @dongjoon-hyun @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] bjornjorgensen opened a new pull request, #40178: [MINOR][FOLLOWUP] Remove Jenkins from web page.

2023-02-26 Thread via GitHub
bjornjorgensen opened a new pull request, #40178: URL: https://github.com/apache/spark/pull/40178 ### What changes were proposed in this pull request? Remove Jenkins from web page. This is a followup on https://github.com/apache/spark-website/pull/442 ### Why are the changes

[GitHub] [spark] dtenedor commented on pull request #39678: [SPARK-16484][SQL] Add HyperLogLogPlusPlus sketch generator/evaluator/aggregator

2023-02-26 Thread via GitHub
dtenedor commented on PR #39678: URL: https://github.com/apache/spark/pull/39678#issuecomment-1445414348 Hi @RyanBerti just checking up on this :) are you back from PTO and still interested in this work? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] NarekDW commented on pull request #40040: [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)

2023-02-26 Thread via GitHub
NarekDW commented on PR #40040: URL: https://github.com/apache/spark/pull/40040#issuecomment-1445410662 @srielau could you take a look, pls? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] srowen commented on pull request #40116: SPARK-41391[SQL][WIP]

2023-02-26 Thread via GitHub
srowen commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1445388789 Looks better. Title should start with `[SPARK-41391]` to link it. Please include the description in the title; there is nothing there now -- This is an automated message from the Apache

[GitHub] [spark] wangyum opened a new pull request, #40177: [SPARK-42583][SQL] Remove the outer join if they are all distinct aggregate functions

2023-02-26 Thread via GitHub
wangyum opened a new pull request, #40177: URL: https://github.com/apache/spark/pull/40177 ### What changes were proposed in this pull request? Enhance `EliminateOuterJoin` by removing the outer join if they are all distinct aggregate functions. For example: ```sql SELECT

[GitHub] [spark] panbingkun opened a new pull request, #40176: [SPARK-42564][CONNECT] Implement SparkSession.version and SparkSession.time

2023-02-26 Thread via GitHub
panbingkun opened a new pull request, #40176: URL: https://github.com/apache/spark/pull/40176 ### What changes were proposed in this pull request? The pr aims to implement SparkSession.version and SparkSession.time. ### Why are the changes needed? API coverage. ### Does

[GitHub] [spark] ritikam2 commented on pull request #40116: SPARK-41391[SQL][WIP]

2023-02-26 Thread via GitHub
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1445296889 Sean I tried to correct the two things pointed out by you. Let me know if that works -- This is an automated message from the Apache Git Service. To respond to the message, please