[GitHub] [spark] aokolnychyi commented on pull request #40655: [SPARK-42855][SQL] Use runtime null checks in TableOutputResolver

2023-04-04 Thread via GitHub
aokolnychyi commented on PR #40655: URL: https://github.com/apache/spark/pull/40655#issuecomment-1496932417 @gengliangwang, got it. I was initially concerned as well but I believe this is the right thing to do after we discussed it. -- This is an automated message from the Apache Git

[GitHub] [spark] wankunde commented on a diff in pull request #40523: [SPARK-42897][SQL] Avoid evaluate more than once for the variables from the left side in the FullOuter SMJ condition

2023-04-04 Thread via GitHub
wankunde commented on code in PR #40523: URL: https://github.com/apache/spark/pull/40523#discussion_r1158021949 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala: ## @@ -1036,8 +1036,17 @@ case class SortMergeJoinExec( val

[GitHub] [spark] gengliangwang commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timestamp literals

2023-04-04 Thread via GitHub
gengliangwang commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157988309 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala: ## @@ -271,32 +271,37 @@ case class

[GitHub] [spark] HyukjinKwon closed pull request #40669: [SPARK-42983][CONNECT][PYTHON] Fix createDataFrame to handle 0-dim numpy array properly

2023-04-04 Thread via GitHub
HyukjinKwon closed pull request #40669: [SPARK-42983][CONNECT][PYTHON] Fix createDataFrame to handle 0-dim numpy array properly URL: https://github.com/apache/spark/pull/40669 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #40669: [SPARK-42983][CONNECT][PYTHON] Fix createDataFrame to handle 0-dim numpy array properly

2023-04-04 Thread via GitHub
HyukjinKwon commented on PR #40669: URL: https://github.com/apache/spark/pull/40669#issuecomment-1496856632 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #40662: [SPARK-43030][SQL] Deduplicate relations with metadata columns

2023-04-04 Thread via GitHub
cloud-fan commented on code in PR #40662: URL: https://github.com/apache/spark/pull/40662#discussion_r1157966718 ## sql/core/src/test/resources/tpcds-plan-stability/approved-plans-modified/q27.sf100/explain.txt: ## @@ -209,208 +209,208 @@ Aggregate Attributes [4]:

[GitHub] [spark] cloud-fan commented on a diff in pull request #40662: [SPARK-43030][SQL] Deduplicate relations with metadata columns

2023-04-04 Thread via GitHub
cloud-fan commented on code in PR #40662: URL: https://github.com/apache/spark/pull/40662#discussion_r1157966389 ## sql/core/src/test/resources/tpcds-plan-stability/approved-plans-modified/q27.sf100/explain.txt: ## @@ -209,208 +209,208 @@ Aggregate Attributes [4]:

[GitHub] [spark] cloud-fan commented on a diff in pull request #40662: [SPARK-43030][SQL] Deduplicate relations with metadata columns

2023-04-04 Thread via GitHub
cloud-fan commented on code in PR #40662: URL: https://github.com/apache/spark/pull/40662#discussion_r1157965959 ## sql/core/src/test/resources/sql-tests/analyzer-results/subquery/in-subquery/in-with-cte.sql.out: ## @@ -198,23 +198,20 @@ WithCTE :: : :

[GitHub] [spark] gengliangwang commented on pull request #40655: [SPARK-42855][SQL] Use runtime null checks in TableOutputResolver

2023-04-04 Thread via GitHub
gengliangwang commented on PR #40655: URL: https://github.com/apache/spark/pull/40655#issuecomment-1496846899 @aokolnychyi Yes I got it. My concern was around the behavior change. I am OK with the idea and merging this one. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] HyukjinKwon commented on pull request #40671: [MINOR][CONNECT][DOCS] Clarify Spark Connect option in Spark scripts

2023-04-04 Thread via GitHub
HyukjinKwon commented on PR #40671: URL: https://github.com/apache/spark/pull/40671#issuecomment-1496816167 cc @allanf-db @zhengruifeng @ueshin FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cloud-fan commented on pull request #40124: [SPARK-37980][SQL] Access row_index via _metadata if possible in tests

2023-04-04 Thread via GitHub
cloud-fan commented on PR #40124: URL: https://github.com/apache/spark/pull/40124#issuecomment-1496815727 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon opened a new pull request, #40671: [MINOR][CONNECT][DOCS] Clarify Spark Connect option in Spark scripts

2023-04-04 Thread via GitHub
HyukjinKwon opened a new pull request, #40671: URL: https://github.com/apache/spark/pull/40671 ### What changes were proposed in this pull request? This PR clarifies Spark Connect option to be consistent with other sections. ### Why are the changes needed? To be

[GitHub] [spark] cloud-fan closed pull request #40124: [SPARK-37980][SQL] Access row_index via _metadata if possible in tests

2023-04-04 Thread via GitHub
cloud-fan closed pull request #40124: [SPARK-37980][SQL] Access row_index via _metadata if possible in tests URL: https://github.com/apache/spark/pull/40124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] cloud-fan commented on a diff in pull request #40651: [SPARK-43019][SQL] Move Ordering to PhysicalDataType

2023-04-04 Thread via GitHub
cloud-fan commented on code in PR #40651: URL: https://github.com/apache/spark/pull/40651#discussion_r1157929474 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -155,7 +156,7 @@ abstract class PercentileBase }

[GitHub] [spark] HyukjinKwon closed pull request #40670: [MINOR][PYTHON][CONNECT][DOCS] Deduplicate versionchanged directive in Catalog

2023-04-04 Thread via GitHub
HyukjinKwon closed pull request #40670: [MINOR][PYTHON][CONNECT][DOCS] Deduplicate versionchanged directive in Catalog URL: https://github.com/apache/spark/pull/40670 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #40670: [MINOR][PYTHON][CONNECT][DOCS] Deduplicate versionchanged directive in Catalog

2023-04-04 Thread via GitHub
HyukjinKwon commented on PR #40670: URL: https://github.com/apache/spark/pull/40670#issuecomment-1496811319 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell commented on a diff in pull request #40651: [SPARK-43019][SQL] Move Ordering to PhysicalDataType

2023-04-04 Thread via GitHub
hvanhovell commented on code in PR #40651: URL: https://github.com/apache/spark/pull/40651#discussion_r1157925662 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -155,7 +156,7 @@ abstract class PercentileBase }

[GitHub] [spark] cloud-fan commented on a diff in pull request #40651: [SPARK-43019][SQL] Move Ordering to PhysicalDataType

2023-04-04 Thread via GitHub
cloud-fan commented on code in PR #40651: URL: https://github.com/apache/spark/pull/40651#discussion_r1157924616 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -155,7 +156,7 @@ abstract class PercentileBase }

[GitHub] [spark] hvanhovell commented on a diff in pull request #40651: [SPARK-43019][SQL] Move Ordering to PhysicalDataType

2023-04-04 Thread via GitHub
hvanhovell commented on code in PR #40651: URL: https://github.com/apache/spark/pull/40651#discussion_r1157922948 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -155,7 +156,7 @@ abstract class PercentileBase }

[GitHub] [spark] cloud-fan commented on a diff in pull request #40651: [SPARK-43019][SQL] Move Ordering to PhysicalDataType

2023-04-04 Thread via GitHub
cloud-fan commented on code in PR #40651: URL: https://github.com/apache/spark/pull/40651#discussion_r1157921993 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -155,7 +156,7 @@ abstract class PercentileBase }

[GitHub] [spark] cloud-fan commented on a diff in pull request #40651: [SPARK-43019][SQL] Move Ordering to PhysicalDataType

2023-04-04 Thread via GitHub
cloud-fan commented on code in PR #40651: URL: https://github.com/apache/spark/pull/40651#discussion_r1157911591 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala: ## @@ -17,53 +17,204 @@ package org.apache.spark.sql.catalyst.types

[GitHub] [spark] amaliujia commented on a diff in pull request #40611: [SPARK-42981][CONNECT] Add direct arrow serialization

2023-04-04 Thread via GitHub
amaliujia commented on code in PR #40611: URL: https://github.com/apache/spark/pull/40611#discussion_r1157907417 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowSerializer.scala: ## @@ -0,0 +1,529 @@ +/* + * Licensed to the Apache

[GitHub] [spark] HyukjinKwon commented on pull request #40670: [MINOR][PYTHON][CONNECT][DOCS] Deduplicate versionchanged directive in Catalog

2023-04-04 Thread via GitHub
HyukjinKwon commented on PR #40670: URL: https://github.com/apache/spark/pull/40670#issuecomment-1496775494 cc @zhengruifeng @ueshin FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon opened a new pull request, #40670: [MINOR][PYTHON][CONNECT][DOCS] Deduplicate versionchanged directive in Catalog

2023-04-04 Thread via GitHub
HyukjinKwon opened a new pull request, #40670: URL: https://github.com/apache/spark/pull/40670 ### What changes were proposed in this pull request? This PR proposes to deduplicate versionchanged directive in Catalog. ### Why are the changes needed? All API is implemented

[GitHub] [spark] aokolnychyi commented on pull request #40655: [SPARK-42855][SQL] Use runtime null checks in TableOutputResolver

2023-04-04 Thread via GitHub
aokolnychyi commented on PR #40655: URL: https://github.com/apache/spark/pull/40655#issuecomment-1496774141 @gengliangwang, this PR is based on the consensus we reached in [this](https://github.com/apache/spark/pull/40308#discussion_r1127081206) thread. Each approach has its own pros/cons.

[GitHub] [spark] HyukjinKwon closed pull request #40666: [SPARK-43009][SQL][3.4] Parameterized `sql()` with `Any` constants

2023-04-04 Thread via GitHub
HyukjinKwon closed pull request #40666: [SPARK-43009][SQL][3.4] Parameterized `sql()` with `Any` constants URL: https://github.com/apache/spark/pull/40666 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #40666: [SPARK-43009][SQL][3.4] Parameterized `sql()` with `Any` constants

2023-04-04 Thread via GitHub
HyukjinKwon commented on PR #40666: URL: https://github.com/apache/spark/pull/40666#issuecomment-1496753586 Merged to branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40664: [SPARK-43024][PS][INFRA] Upgrade pandas to 2.0.0

2023-04-04 Thread via GitHub
HyukjinKwon commented on code in PR #40664: URL: https://github.com/apache/spark/pull/40664#discussion_r1157883873 ## dev/infra/Dockerfile: ## @@ -64,8 +64,8 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht # See more in SPARK-39735 ENV

[GitHub] [spark] HyukjinKwon commented on pull request #40665: [SPARK-42621][PS] Add inclusive parameter for pd.date_range

2023-04-04 Thread via GitHub
HyukjinKwon commented on PR #40665: URL: https://github.com/apache/spark/pull/40665#issuecomment-1496750694 cc @itholic @zhengruifeng @xinrong-meng @Yikun if you find some time to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] hvanhovell commented on pull request #40649: [SPARK-41628][CONNECT][SERVER] The Design for support async query execution

2023-04-04 Thread via GitHub
hvanhovell commented on PR #40649: URL: https://github.com/apache/spark/pull/40649#issuecomment-1496739368 @Hisoka-X thanks for the write up. We should be able to support most of this at the moment. GRPC supports this type of execution out of the box. The reason we did not really go for

[GitHub] [spark] zhengruifeng commented on pull request #40607: [SPARK-42993][ML][CONNECT] Make PyTorch Distributor compatible with Spark Connect

2023-04-04 Thread via GitHub
zhengruifeng commented on PR #40607: URL: https://github.com/apache/spark/pull/40607#issuecomment-1496734980 In my local env, the failed test can pass with even bigger model size. but let me try to reduce the model size for GA to see what will happen. -- This is an automated message

[GitHub] [spark] gengliangwang commented on pull request #40655: [SPARK-42855][SQL] Use runtime null checks in TableOutputResolver

2023-04-04 Thread via GitHub
gengliangwang commented on PR #40655: URL: https://github.com/apache/spark/pull/40655#issuecomment-1496724854 @aokolnychyi @cloud-fan I am +0 for changing the behavior since I haven't heard complaints about this from end-users. Instead, relaxing the strict compiler check can bring

[GitHub] [spark] gengliangwang commented on a diff in pull request #40655: [SPARK-42855][SQL] Use runtime null checks in TableOutputResolver

2023-04-04 Thread via GitHub
gengliangwang commented on code in PR #40655: URL: https://github.com/apache/spark/pull/40655#discussion_r1157855435 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -130,38 +128,93 @@ object TableOutputResolver { }

[GitHub] [spark] ueshin opened a new pull request, #40669: [SPARK-42983][CONNECT][PYTHON] Fix createDataFrame to handle 0-dim numpy array properly

2023-04-04 Thread via GitHub
ueshin opened a new pull request, #40669: URL: https://github.com/apache/spark/pull/40669 ### What changes were proposed in this pull request? Fix `createDataFrame` to handle 0-dim numpy array properly. ### Why are the changes needed? When 0-dim numpy array is passed to

[GitHub] [spark] WweiL commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-04-04 Thread via GitHub
WweiL commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1157836737 ## python/pyspark/sql/connect/streaming/readwriter.py: ## @@ -0,0 +1,484 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] [spark] aokolnychyi commented on pull request #40655: [SPARK-42855][SQL] Use runtime null checks in TableOutputResolver

2023-04-04 Thread via GitHub
aokolnychyi commented on PR #40655: URL: https://github.com/apache/spark/pull/40655#issuecomment-1496675818 Ok, all tests have been adapted. This PR is ready for a detailed review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] pengzhon-db commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-04-04 Thread via GitHub
pengzhon-db commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1157815748 ## python/pyspark/sql/connect/streaming/query.py: ## @@ -0,0 +1,161 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] [spark] dtenedor commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
dtenedor commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157804913 ## sql/core/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumnsSuite.scala: ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] shardulm94 commented on pull request #40637: [SPARK-43002][YARN] Modify yarn client application report logging frequency to reduce noise

2023-04-04 Thread via GitHub
shardulm94 commented on PR #40637: URL: https://github.com/apache/spark/pull/40637#issuecomment-1496640034 Thanks @ShreyeshArangath for this! I think it helps clear a lot of unnecessary noise from user logs and keeps the logs manageable. One thing I noticed is that we set

[GitHub] [spark] gengliangwang commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
gengliangwang commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r115619 ## sql/core/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumnsSuite.scala: ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] gengliangwang commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
gengliangwang commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r115170 ## sql/core/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumnsSuite.scala: ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] justaparth closed pull request #40668: spark protobuf: add materializeDefaults option to spark-protobuf

2023-04-04 Thread via GitHub
justaparth closed pull request #40668: spark protobuf: add materializeDefaults option to spark-protobuf URL: https://github.com/apache/spark/pull/40668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] justaparth opened a new pull request, #40668: spark protobuf: add materializeDefaults option to spark-protobuf

2023-04-04 Thread via GitHub
justaparth opened a new pull request, #40668: URL: https://github.com/apache/spark/pull/40668 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] dtenedor commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
dtenedor commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157765220 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala: ## @@ -271,32 +271,33 @@ case class ResolveDefaultColumns(catalog:

[GitHub] [spark] gengliangwang commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
gengliangwang commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157763648 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala: ## @@ -271,33 +271,45 @@ case class

[GitHub] [spark] gengliangwang commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
gengliangwang commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157763473 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala: ## @@ -271,32 +271,33 @@ case class

[GitHub] [spark] dtenedor commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
dtenedor commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157749910 ## sql/core/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumnsSuite.scala: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dtenedor commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
dtenedor commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157749445 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala: ## @@ -271,32 +271,33 @@ case class ResolveDefaultColumns(catalog:

[GitHub] [spark] dtenedor commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
dtenedor commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157747931 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala: ## @@ -271,32 +271,33 @@ case class ResolveDefaultColumns(catalog:

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40655: [SPARK-42855][SQL] Use runtime null checks in TableOutputResolver

2023-04-04 Thread via GitHub
aokolnychyi commented on code in PR #40655: URL: https://github.com/apache/spark/pull/40655#discussion_r1157744682 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -130,38 +128,93 @@ object TableOutputResolver { } }

[GitHub] [spark] Kimahriman commented on pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-04-04 Thread via GitHub
Kimahriman commented on PR #34558: URL: https://github.com/apache/spark/pull/34558#issuecomment-1496583248 > There seems to be a lot of repetition. Wish it could be avoided somehow but can't help though (beside nit-picking). Thanks for the review! I tried to get as much common code

[GitHub] [spark] Kimahriman commented on a diff in pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-04-04 Thread via GitHub
Kimahriman commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1157743929 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -101,6 +101,14 @@ case class NamedLambdaVariable(

[GitHub] [spark] Kimahriman commented on a diff in pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-04-04 Thread via GitHub
Kimahriman commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1157743684 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala: ## @@ -172,6 +172,40 @@ class CodegenContext extends Logging {

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40655: [SPARK-42855][SQL] Use runtime null checks in TableOutputResolver

2023-04-04 Thread via GitHub
dongjoon-hyun commented on code in PR #40655: URL: https://github.com/apache/spark/pull/40655#discussion_r1157717949 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -130,38 +128,93 @@ object TableOutputResolver { }

[GitHub] [spark] dongjoon-hyun commented on pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

2023-04-04 Thread via GitHub
dongjoon-hyun commented on PR #40645: URL: https://github.com/apache/spark/pull/40645#issuecomment-1496541323 Sorry for misleading you. You are right about timezone. What I imagined was more like the following case. ``` $ docker run -it --rm --cap-add SYS_TIME openjdk:latest bash

[GitHub] [spark] ksumit opened a new pull request, #40667: Improve IDE build experience against jdk11

2023-04-04 Thread via GitHub
ksumit opened a new pull request, #40667: URL: https://github.com/apache/spark/pull/40667 ### What changes were proposed in this pull request? Building the project against jdk11 on IDE shows errors because `Platform.java` depends on `sun.misc` which is in `jdk.unsupported` module in

[GitHub] [spark] amaliujia commented on pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-04-04 Thread via GitHub
amaliujia commented on PR #40586: URL: https://github.com/apache/spark/pull/40586#issuecomment-1496448770 The proto side overall looks good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-04-04 Thread via GitHub
amaliujia commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1157644345 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -177,3 +179,126 @@ message WriteOperationV2 { // (Optional) A condition for

[GitHub] [spark] amaliujia commented on a diff in pull request #40651: [SPARK-43019][SQL] Move Ordering to PhysicalDataType

2023-04-04 Thread via GitHub
amaliujia commented on code in PR #40651: URL: https://github.com/apache/spark/pull/40651#discussion_r1157641944 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala: ## @@ -17,53 +17,234 @@ package org.apache.spark.sql.catalyst.types

[GitHub] [spark] gengliangwang commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
gengliangwang commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157637643 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala: ## @@ -271,32 +271,33 @@ case class

[GitHub] [spark] hvanhovell commented on a diff in pull request #40651: [SPARK-43019][SQL] Move Ordering to PhysicalDataType

2023-04-04 Thread via GitHub
hvanhovell commented on code in PR #40651: URL: https://github.com/apache/spark/pull/40651#discussion_r1157632963 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala: ## @@ -17,53 +17,234 @@ package org.apache.spark.sql.catalyst.types

[GitHub] [spark] gengliangwang commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
gengliangwang commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157631790 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala: ## @@ -271,32 +271,33 @@ case class

[GitHub] [spark] amaliujia commented on a diff in pull request #40651: [SPARK-43019][SQL] Move Ordering to PhysicalDataType

2023-04-04 Thread via GitHub
amaliujia commented on code in PR #40651: URL: https://github.com/apache/spark/pull/40651#discussion_r1157631385 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala: ## @@ -17,53 +17,234 @@ package org.apache.spark.sql.catalyst.types

[GitHub] [spark] amaliujia commented on a diff in pull request #40651: [SPARK-43019][SQL] Move Ordering to PhysicalDataType

2023-04-04 Thread via GitHub
amaliujia commented on code in PR #40651: URL: https://github.com/apache/spark/pull/40651#discussion_r1157629600 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala: ## @@ -17,53 +17,234 @@ package org.apache.spark.sql.catalyst.types

[GitHub] [spark] gengliangwang commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
gengliangwang commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157627009 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala: ## @@ -271,32 +271,33 @@ case class

[GitHub] [spark] gengliangwang commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
gengliangwang commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157622700 ## sql/core/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumnsSuite.scala: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dtenedor commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
dtenedor commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157620766 ## sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala: ## Review Comment: NP, moved to this package instead. -- This is an

[GitHub] [spark] gengliangwang commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
gengliangwang commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157616471 ## sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala: ## Review Comment: I meant ```

[GitHub] [spark] gengliangwang commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
gengliangwang commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157615804 ## sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala: ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] dtenedor commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
dtenedor commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157610703 ## sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala: ## Review Comment: Yes, good point! Moved. ##

[GitHub] [spark] Kimahriman commented on pull request #32987: [SPARK-35564][SQL] Support subexpression elimination for conditionally evaluated expressions

2023-04-04 Thread via GitHub
Kimahriman commented on PR #32987: URL: https://github.com/apache/spark/pull/32987#issuecomment-1496399849 Threw together a quick script to get some rough numbers. Did two types of queries, one doing a `sqrt` and one doing a `regexp_extract` to test a simple numeric thing and a more

[GitHub] [spark] gengliangwang commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
gengliangwang commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157602703 ## sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala: ## Review Comment: This should be under `sq/catalyst`, right? -- This

[GitHub] [spark] gengliangwang commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
gengliangwang commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1157603360 ## sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala: ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] hvanhovell commented on a diff in pull request #40651: [SPARK-43019][SQL] Move Ordering to PhysicalDataType

2023-04-04 Thread via GitHub
hvanhovell commented on code in PR #40651: URL: https://github.com/apache/spark/pull/40651#discussion_r1157602019 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala: ## @@ -17,53 +17,234 @@ package org.apache.spark.sql.catalyst.types

[GitHub] [spark] hvanhovell commented on a diff in pull request #40651: [SPARK-43019][SQL] Move Ordering to PhysicalDataType

2023-04-04 Thread via GitHub
hvanhovell commented on code in PR #40651: URL: https://github.com/apache/spark/pull/40651#discussion_r1157601361 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala: ## @@ -17,53 +17,234 @@ package org.apache.spark.sql.catalyst.types

[GitHub] [spark] amaliujia commented on a diff in pull request #40651: [SPARK-43019][SQL] Move Ordering to PhysicalDataType

2023-04-04 Thread via GitHub
amaliujia commented on code in PR #40651: URL: https://github.com/apache/spark/pull/40651#discussion_r1157600655 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala: ## @@ -17,53 +17,234 @@ package org.apache.spark.sql.catalyst.types

[GitHub] [spark] amaliujia commented on a diff in pull request #40651: [SPARK-43019][SQL] Move Ordering to PhysicalDataType

2023-04-04 Thread via GitHub
amaliujia commented on code in PR #40651: URL: https://github.com/apache/spark/pull/40651#discussion_r1157598354 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala: ## @@ -17,53 +17,234 @@ package org.apache.spark.sql.catalyst.types

[GitHub] [spark] rangadi commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-04-04 Thread via GitHub
rangadi commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1157597665 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -177,3 +179,126 @@ message WriteOperationV2 { // (Optional) A condition for

[GitHub] [spark] rangadi commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-04-04 Thread via GitHub
rangadi commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1157385141 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -177,3 +179,118 @@ message WriteOperationV2 { // (Optional) A condition for

[GitHub] [spark] tgravescs commented on pull request #40622: [SPARK-43004][CORE] Fix typo in ResourceRequest.equals()

2023-04-04 Thread via GitHub
tgravescs commented on PR #40622: URL: https://github.com/apache/spark/pull/40622#issuecomment-1496383011 definitely looks like a typo, thanks for catching and fixing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] MaxGekk commented on pull request #40623: [SPARK-43009][SQL] Parameterized `sql()` with `Any` constants

2023-04-04 Thread via GitHub
MaxGekk commented on PR #40623: URL: https://github.com/apache/spark/pull/40623#issuecomment-1496368421 Here is the backport to `branch-3.4`: https://github.com/apache/spark/pull/40666 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] yliou commented on pull request #35939: [SPARK-38617][SQL][WEBUI] Show Spark rule and phase timings in SQL UI and REST API

2023-04-04 Thread via GitHub
yliou commented on PR #35939: URL: https://github.com/apache/spark/pull/35939#issuecomment-1496353590 @dependabot reopen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] MaxGekk opened a new pull request, #40666: [SPARK-43009][SQL][3.4] Parameterized `sql()` with `Any` constants

2023-04-04 Thread via GitHub
MaxGekk opened a new pull request, #40666: URL: https://github.com/apache/spark/pull/40666 ### What changes were proposed in this pull request? In the PR, I propose to change API of parameterized SQL, and replace type of argument values from `string` to `Any` in Scala/Java/Python and

[GitHub] [spark] dtenedor commented on pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals

2023-04-04 Thread via GitHub
dtenedor commented on PR #40652: URL: https://github.com/apache/spark/pull/40652#issuecomment-1496340472 Hi @gengliangwang here is the correctness bug fix  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-04-04 Thread via GitHub
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1496311839 Thanks a lot @cloud-fan for the guidance and support in getting this issue fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] aokolnychyi commented on pull request #40655: [SPARK-42855][SQL] Use runtime null checks in TableOutputResolver

2023-04-04 Thread via GitHub
aokolnychyi commented on PR #40655: URL: https://github.com/apache/spark/pull/40655#issuecomment-1496292594 @dongjoon-hyun, let me look into test failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dzhigimont opened a new pull request, #40665: [SPARK-42621][PS] Add inclusive parameter for pd.date_range

2023-04-04 Thread via GitHub
dzhigimont opened a new pull request, #40665: URL: https://github.com/apache/spark/pull/40665 ### What changes were proposed in this pull request? Add inclusive parameter for pd.date_range to support the pandas 2.0.0 ### Why are the changes needed? When pandas 2.0.0 is released,

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40655: [SPARK-42855][SQL] Use runtime null checks in TableOutputResolver

2023-04-04 Thread via GitHub
aokolnychyi commented on code in PR #40655: URL: https://github.com/apache/spark/pull/40655#discussion_r1157516765 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -130,38 +128,93 @@ object TableOutputResolver { } }

[GitHub] [spark] srielau commented on a diff in pull request #40641: [SPARK-43011][SQL] `array_insert` should fail with 0 index

2023-04-04 Thread via GitHub
srielau commented on code in PR #40641: URL: https://github.com/apache/spark/pull/40641#discussion_r1157492500 ## core/src/main/resources/error/error-classes.json: ## @@ -542,6 +542,12 @@ ], "sqlState" : "22003" }, + "ARRAY_INSERT_BY_INDEX_ZERO" : { +"message"

[GitHub] [spark] dzhigimont opened a new pull request, #40664: [SPARK-43024][PS][INFRA] Upgrade pandas to 2.0.0

2023-04-04 Thread via GitHub
dzhigimont opened a new pull request, #40664: URL: https://github.com/apache/spark/pull/40664 ### What changes were proposed in this pull request? The PR proposes to upgrade pandas to 2.0.0 ### Why are the changes needed? Support latest pandas for pandas API on Spark

[GitHub] [spark] tanvn commented on pull request #38053: [SPARK-40600] Support recursiveFileLookup for partitioned datasource

2023-04-04 Thread via GitHub
tanvn commented on PR #38053: URL: https://github.com/apache/spark/pull/38053#issuecomment-1496253469 @HyukjinKwon @wForget Hi, may I know the status of this PR? Would like to take part in this issue as we are facing this while reading data from an orc partitioned table and do not

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40609: [SPARK-42316][SQL] Assign name to _LEGACY_ERROR_TEMP_2044

2023-04-04 Thread via GitHub
Hisoka-X commented on code in PR #40609: URL: https://github.com/apache/spark/pull/40609#discussion_r1157482628 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -625,6 +625,21 @@ class QueryExecutionErrorsSuite } } +

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40609: [SPARK-42316][SQL] Assign name to _LEGACY_ERROR_TEMP_2044

2023-04-04 Thread via GitHub
Hisoka-X commented on code in PR #40609: URL: https://github.com/apache/spark/pull/40609#discussion_r1157481984 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -625,6 +625,21 @@ class QueryExecutionErrorsSuite } } +

[GitHub] [spark] MaxGekk commented on pull request #40623: [SPARK-43009][SQL] Parameterized `sql()` with `Any` constants

2023-04-04 Thread via GitHub
MaxGekk commented on PR #40623: URL: https://github.com/apache/spark/pull/40623#issuecomment-1496220564 @cloud-fan I am working on the backport ... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40632: [SPARK-42298][SQL] Assign name to _LEGACY_ERROR_TEMP_2132

2023-04-04 Thread via GitHub
Hisoka-X commented on code in PR #40632: URL: https://github.com/apache/spark/pull/40632#discussion_r1157457567 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -1404,8 +1404,8 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] srielau commented on a diff in pull request #38867: [SPARK-41234][SQL][PYTHON] Add `array_insert` function

2023-04-04 Thread via GitHub
srielau commented on code in PR #38867: URL: https://github.com/apache/spark/pull/38867#discussion_r1157449697 ## sql/core/src/test/resources/sql-tests/results/array.sql.out: ## @@ -431,6 +431,104 @@ struct NULL +-- !query +select array_insert(array(1, 2, 3), 3, 4) +--

[GitHub] [spark] srielau commented on a diff in pull request #38867: [SPARK-41234][SQL][PYTHON] Add `array_insert` function

2023-04-04 Thread via GitHub
srielau commented on code in PR #38867: URL: https://github.com/apache/spark/pull/38867#discussion_r1157449697 ## sql/core/src/test/resources/sql-tests/results/array.sql.out: ## @@ -431,6 +431,104 @@ struct NULL +-- !query +select array_insert(array(1, 2, 3), 3, 4) +--

[GitHub] [spark] srielau commented on a diff in pull request #38867: [SPARK-41234][SQL][PYTHON] Add `array_insert` function

2023-04-04 Thread via GitHub
srielau commented on code in PR #38867: URL: https://github.com/apache/spark/pull/38867#discussion_r1157447487 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,155 @@ case class ArrayExcept(left:

[GitHub] [spark] cloud-fan commented on pull request #40623: [SPARK-43009][SQL] Parameterized `sql()` with `Any` constants

2023-04-04 Thread via GitHub
cloud-fan commented on PR #40623: URL: https://github.com/apache/spark/pull/40623#issuecomment-1496177514 It has conflicts with 3.4, @MaxGekk can you create a backport PR? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan closed pull request #40623: [SPARK-43009][SQL] Parameterized `sql()` with `Any` constants

2023-04-04 Thread via GitHub
cloud-fan closed pull request #40623: [SPARK-43009][SQL] Parameterized `sql()` with `Any` constants URL: https://github.com/apache/spark/pull/40623 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

  1   2   >