[GitHub] [spark] AngersZhuuuu commented on a change in pull request #32365: [SPARK-35228][SQL] Add expression ToPrettyString for keep consistent between hive/spark format in df.show and transform

2021-06-27 Thread GitBox
AngersZh commented on a change in pull request #32365: URL: https://github.com/apache/spark/pull/32365#discussion_r659522065 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ## @@ -2649,3 +2652,155 @@ case class Se

[GitHub] [spark] maropu commented on a change in pull request #33108: [SPARK-35898][SQL] Fix arrays and maps in RowToColumnConverter

2021-06-27 Thread GitBox
maropu commented on a change in pull request #33108: URL: https://github.com/apache/spark/pull/33108#discussion_r659512018 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala ## @@ -264,12 +264,12 @@ private object RowToColumnConverter { c

[GitHub] [spark] dgd-contributor commented on pull request #32916: [SPARK-35064][SQL] Group error in spark-catalyst

2021-06-27 Thread GitBox
dgd-contributor commented on pull request #32916: URL: https://github.com/apache/spark/pull/32916#issuecomment-869408354 @beliefer since cloud-fan maybe a little bit busy, who else can we request a review? Thanks -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] Peng-Lei commented on a change in pull request #32949: [SPARK-35749][SPARK-35773][SQL] Parse unit list interval literals as tightest year-month/day-time interval types

2021-06-27 Thread GitBox
Peng-Lei commented on a change in pull request #32949: URL: https://github.com/apache/spark/pull/32949#discussion_r659518960 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2358,20 +2358,56 @@ class AstBuilder extends SqlB

[GitHub] [spark] cloud-fan commented on a change in pull request #33106: [SPARK-35876][SQL] ArraysZip should retain field names to avoid being re-written by analyzer/optimizer

2021-06-27 Thread GitBox
cloud-fan commented on a change in pull request #33106: URL: https://github.com/apache/spark/pull/33106#discussion_r659519859 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -181,16 +181,31 @@ case class MapK

[GitHub] [spark] Peng-Lei commented on a change in pull request #32949: [SPARK-35749][SPARK-35773][SQL] Parse unit list interval literals as tightest year-month/day-time interval types

2021-06-27 Thread GitBox
Peng-Lei commented on a change in pull request #32949: URL: https://github.com/apache/spark/pull/32949#discussion_r659518960 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2358,20 +2358,56 @@ class AstBuilder extends SqlB

[GitHub] [spark] sarutak commented on pull request #32949: [SPARK-35749][SPARK-35773][SQL] Parse unit list interval literals as tightest year-month/day-time interval types

2021-06-27 Thread GitBox
sarutak commented on pull request #32949: URL: https://github.com/apache/spark/pull/32949#issuecomment-869397727 @MaxGekk If you feel this change complicated, how about the following change in this PR? * Leave the behavior for multi units interval literals as it is like `1 year 2 mo

[GitHub] [spark] ben-manes commented on a change in pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-06-27 Thread GitBox
ben-manes commented on a change in pull request #31517: URL: https://github.com/apache/spark/pull/31517#discussion_r659500929 ## File path: common/network-shuffle/pom.xml ## @@ -58,6 +58,14 @@ slf4j-api provided + + com.github.ben-manes.caffeine +

[GitHub] [spark] ben-manes commented on a change in pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-06-27 Thread GitBox
ben-manes commented on a change in pull request #31517: URL: https://github.com/apache/spark/pull/31517#discussion_r659500929 ## File path: common/network-shuffle/pom.xml ## @@ -58,6 +58,14 @@ slf4j-api provided + + com.github.ben-manes.caffeine +

[GitHub] [spark] cloud-fan commented on a change in pull request #33113: [SPARK-34302][SQL] Migrate ALTER TABLE ... CHANGE COLUMN command to use UnresolvedTable to resolve the identifier

2021-06-27 Thread GitBox
cloud-fan commented on a change in pull request #33113: URL: https://github.com/apache/spark/pull/33113#discussion_r659498507 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala ## @@ -1142,3 +1142,42 @@ case class AlterTableR

[GitHub] [spark] cloud-fan commented on a change in pull request #33113: [SPARK-34302][SQL] Migrate ALTER TABLE ... CHANGE COLUMN command to use UnresolvedTable to resolve the identifier

2021-06-27 Thread GitBox
cloud-fan commented on a change in pull request #33113: URL: https://github.com/apache/spark/pull/33113#discussion_r659497730 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala ## @@ -150,7 +165,12 @@ case class ResolvedPar

[GitHub] [spark] LuciferYang commented on a change in pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-06-27 Thread GitBox
LuciferYang commented on a change in pull request #31517: URL: https://github.com/apache/spark/pull/31517#discussion_r659495725 ## File path: common/network-shuffle/pom.xml ## @@ -58,6 +58,14 @@ slf4j-api provided + + com.github.ben-manes.caffeine +

[GitHub] [spark] LuciferYang commented on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-06-27 Thread GitBox
LuciferYang commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-869374332 @holdenk I've updated this pr to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [spark] AmplabJenkins commented on pull request #33114: [SPARK-35913][SQL] Create hive permanent function with owner name

2021-06-27 Thread GitBox
AmplabJenkins commented on pull request #33114: URL: https://github.com/apache/spark/pull/33114#issuecomment-869372268 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] cloud-fan commented on pull request #33103: [SPARK-35886][SQL] PromotePrecision should not overwrite genCode

2021-06-27 Thread GitBox
cloud-fan commented on pull request #33103: URL: https://github.com/apache/spark/pull/33103#issuecomment-869368101 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cloud-fan commented on a change in pull request #33099: [SPARK-35904][SQL] Collapse above RebalancePartitions

2021-06-27 Thread GitBox
cloud-fan commented on a change in pull request #33099: URL: https://github.com/apache/spark/pull/33099#discussion_r659486835 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -911,6 +911,11 @@ object CollapseRepartition ex

[GitHub] [spark] cloud-fan commented on a change in pull request #33099: [SPARK-35904][SQL] Collapse above RebalancePartitions

2021-06-27 Thread GitBox
cloud-fan commented on a change in pull request #33099: URL: https://github.com/apache/spark/pull/33099#discussion_r659486480 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -911,6 +911,11 @@ object CollapseRepartition ex

[GitHub] [spark] cloud-fan commented on pull request #33108: [SPARK-35898][SQL] Fix arrays and maps in RowToColumnConverter

2021-06-27 Thread GitBox
cloud-fan commented on pull request #33108: URL: https://github.com/apache/spark/pull/33108#issuecomment-869366230 @viirya @maropu @revans2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33101: [SPARK-35907][CORE] Instead of File#mkdirs, Files#createDirectories is expected.

2021-06-27 Thread GitBox
dongjoon-hyun commented on a change in pull request #33101: URL: https://github.com/apache/spark/pull/33101#discussion_r659484186 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -310,15 +306,17 @@ private[spark] object Utils extends Logging { whil

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33101: [SPARK-35907][CORE] Instead of File#mkdirs, Files#createDirectories is expected.

2021-06-27 Thread GitBox
dongjoon-hyun commented on a change in pull request #33101: URL: https://github.com/apache/spark/pull/33101#discussion_r659484186 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -310,15 +306,17 @@ private[spark] object Utils extends Logging { whil

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33109: [SPARK-35910][CORE][SHUFFLE] Update remoteBlockBytes based on merged block info to reduce task time

2021-06-27 Thread GitBox
dongjoon-hyun commented on a change in pull request #33109: URL: https://github.com/apache/spark/pull/33109#discussion_r659483474 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -379,13 +378,13 @@ final class ShuffleBlockFetch

[GitHub] [spark] imback82 commented on a change in pull request #33113: [SPARK-34302][SQL] Migrate ALTER TABLE ... CHANGE COLUMN command to use UnresolvedTable to resolve the identifier

2021-06-27 Thread GitBox
imback82 commented on a change in pull request #33113: URL: https://github.com/apache/spark/pull/33113#discussion_r659462276 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -522,66 +552,6 @@ trait CheckAnalysis extends

[GitHub] [spark] cxzl25 commented on pull request #33114: [SPARK-35913][SQL] Create hive permanent function with owner name

2021-06-27 Thread GitBox
cxzl25 commented on pull request #33114: URL: https://github.com/apache/spark/pull/33114#issuecomment-869361485 ## Spark https://github.com/apache/spark/blob/0da463e59304954515f003f98574c740b47b89fb/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala#L557-L571

[GitHub] [spark] imback82 commented on pull request #33113: [SPARK-34302][SQL] Migrate ALTER TABLE ... CHANGE COLUMN command to use UnresolvedTable to resolve the identifier

2021-06-27 Thread GitBox
imback82 commented on pull request #33113: URL: https://github.com/apache/spark/pull/33113#issuecomment-869360867 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [spark] dongjoon-hyun commented on pull request #32286: [SPARK-35181][CORE] Use zstd for spark.io.compression.codec by default

2021-06-27 Thread GitBox
dongjoon-hyun commented on pull request #32286: URL: https://github.com/apache/spark/pull/32286#issuecomment-869357134 Thank you for your comments, @mridulm . I'm looking at the stability of GitHub Action. As you know, recently, ZStandard 1.5.0 landed at `master` branch and it seems to

[GitHub] [spark] cxzl25 opened a new pull request #33114: [SPARK-35913][SQL] Create hive permanent function with owner name

2021-06-27 Thread GitBox
cxzl25 opened a new pull request #33114: URL: https://github.com/apache/spark/pull/33114 ### What changes were proposed in this pull request? Create hive permanent function with owner name ### Why are the changes needed? The hive permanent function created by spark does not

[GitHub] [spark] maropu commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-27 Thread GitBox
maropu commented on a change in pull request #32787: URL: https://github.com/apache/spark/pull/32787#discussion_r659464716 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1571,6 +1582,30 @@ class Analyzer(override val cata

[GitHub] [spark] HeartSaVioR commented on pull request #33065: [SPARK-35880][SS] Track the duplicates dropped count in dedupe operator

2021-06-27 Thread GitBox
HeartSaVioR commented on pull request #33065: URL: https://github.com/apache/spark/pull/33065#issuecomment-869342054 Thanks @vkorukanti for your contribution! I merged this into master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [spark] HeartSaVioR closed pull request #33065: [SPARK-35880][SS] Track the duplicates dropped count in dedupe operator

2021-06-27 Thread GitBox
HeartSaVioR closed pull request #33065: URL: https://github.com/apache/spark/pull/33065 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-un

[GitHub] [spark] HeartSaVioR commented on pull request #33065: [SPARK-35880][SS] Track the duplicates dropped count in dedupe operator

2021-06-27 Thread GitBox
HeartSaVioR commented on pull request #33065: URL: https://github.com/apache/spark/pull/33065#issuecomment-869341637 https://github.com/HeartSaVioR/spark/runs/2927695669 <= this build passed. Thanks! Merging to master! -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] imback82 commented on a change in pull request #33113: [SPARK-34302][SQL] Migrate ALTER TABLE ... CHANGE COLUMN command to use UnresolvedTable to resolve the identifier

2021-06-27 Thread GitBox
imback82 commented on a change in pull request #33113: URL: https://github.com/apache/spark/pull/33113#discussion_r659464527 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -444,12 +444,42 @@ trait CheckAnalysis extend

[GitHub] [spark] imback82 commented on a change in pull request #33113: [SPARK-34302][SQL] Migrate ALTER TABLE ... CHANGE COLUMN command to use UnresolvedTable to resolve the identifier

2021-06-27 Thread GitBox
imback82 commented on a change in pull request #33113: URL: https://github.com/apache/spark/pull/33113#discussion_r659462276 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -522,66 +552,6 @@ trait CheckAnalysis extends

[GitHub] [spark] imback82 opened a new pull request #33113: [SPARK-34302][SQL] Migrate ALTER TABLE ... CHANGE COLUMN command to use UnresolvedTable to resolve the identifier

2021-06-27 Thread GitBox
imback82 opened a new pull request #33113: URL: https://github.com/apache/spark/pull/33113 ### What changes were proposed in this pull request? This PR proposes to migrate the following `ALTER TABLE ... CHANGE COLUMN` command to use `UnresolvedTable` as a `child` to resolve t

[GitHub] [spark] maropu commented on pull request #33109: [SPARK-35910][Core][Shuffle] Update remoteBlockBytes based on merged block info to reduce task time

2021-06-27 Thread GitBox
maropu commented on pull request #33109: URL: https://github.com/apache/spark/pull/33109#issuecomment-869331427 I'll leave this to those who are familiar with this part. @cloud-fan @Ngone51 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [spark] yaooqinn commented on pull request #33109: [SPARK-35910][Core][Shuffle] Update remoteBlockBytes based on merged block info to reduce task time

2021-06-27 Thread GitBox
yaooqinn commented on pull request #33109: URL: https://github.com/apache/spark/pull/33109#issuecomment-869330993 thanks @maropu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] maropu commented on a change in pull request #33109: [SPARK-35910][Core][Shuffle] Update remoteBlockBytes based on merged block info to reduce task time

2021-06-27 Thread GitBox
maropu commented on a change in pull request #33109: URL: https://github.com/apache/spark/pull/33109#discussion_r659460952 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -379,13 +378,14 @@ final class ShuffleBlockFetcherItera

[GitHub] [spark] maropu commented on pull request #33109: [SPARK-35910][Core][Shuffle] Update remoteBlockBytes based on merged block info to reduce task time

2021-06-27 Thread GitBox
maropu commented on pull request #33109: URL: https://github.com/apache/spark/pull/33109#issuecomment-869329033 I've checked the code and the change looks reasonable to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #33110: [SPARK-35911] DPP: Update exprId for IN subquery

2021-06-27 Thread GitBox
HyukjinKwon commented on pull request #33110: URL: https://github.com/apache/spark/pull/33110#issuecomment-869326534 @Swinky can you take a look https://github.com/apache/spark/pull/33110/checks?check_run_id=2926460223 and enable Github Actions in your fork repository? -- This is an aut

[GitHub] [spark] HyukjinKwon commented on pull request #33106: [SPARK-35876][SQL] ArraysZip should retain field names to avoid being re-written by analyzer/optimizer

2021-06-27 Thread GitBox
HyukjinKwon commented on pull request #33106: URL: https://github.com/apache/spark/pull/33106#issuecomment-869326312 cc @cloud-fan and @ueshin FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #33104: [SPARK-35902][Core] spark.driver.log.dfsDir with hdfs scheme failed

2021-06-27 Thread GitBox
HyukjinKwon commented on pull request #33104: URL: https://github.com/apache/spark/pull/33104#issuecomment-869324375 @fhygh > Does this PR introduce any user-facing change? > How was this patch tested? Please describe both properly. -- This is an automated message fro

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33100: [SPARK-35906][SQL] Remove order by if the maximum number of rows less than or equal to 1

2021-06-27 Thread GitBox
HyukjinKwon commented on a change in pull request #33100: URL: https://github.com/apache/spark/pull/33100#discussion_r659455547 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -1218,6 +1219,7 @@ object EliminateSorts exte

[GitHub] [spark] HyukjinKwon closed pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window

2021-06-27 Thread GitBox
HyukjinKwon closed pull request #33097: URL: https://github.com/apache/spark/pull/33097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-un

[GitHub] [spark] HyukjinKwon commented on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window

2021-06-27 Thread GitBox
HyukjinKwon commented on pull request #33097: URL: https://github.com/apache/spark/pull/33097#issuecomment-869321504 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #33093: [SPARK-35897][SS][WIP] Support user defined initial state with flatMapGroupsWithState in Structured Streaming

2021-06-27 Thread GitBox
HyukjinKwon commented on pull request #33093: URL: https://github.com/apache/spark/pull/33093#issuecomment-869314518 @rahulsmahadev, mind keeping the PR description template (https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE)? -- This is an automated message from

[GitHub] [spark] yaooqinn commented on pull request #33109: [SPARK-35910][Core][Shuffle] Update remoteBlockBytes based on merged block info to reduce task time

2021-06-27 Thread GitBox
yaooqinn commented on pull request #33109: URL: https://github.com/apache/spark/pull/33109#issuecomment-869306270 also cc @cloud-fan @maropu @Ngone51 thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [spark] yaooqinn commented on pull request #33109: [SPARK-35910][Core][Shuffle] Update remoteBlockBytes based on merged block info to reduce task time

2021-06-27 Thread GitBox
yaooqinn commented on pull request #33109: URL: https://github.com/apache/spark/pull/33109#issuecomment-869305089 > BTW, is there no statistic change in terms of remoteBlockBytes value? Hi @dongjoon-hyun, I add a new test to improve the coverage. -- This is an automated mes

[GitHub] [spark] HyukjinKwon commented on pull request #33083: Allow sequences (tuples and lists) as pivot values argument in PySpark.

2021-06-27 Thread GitBox
HyukjinKwon commented on pull request #33083: URL: https://github.com/apache/spark/pull/33083#issuecomment-869303149 Otherwise, looks fine to me too. I'll leave it to him. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] HyukjinKwon commented on pull request #33083: Allow sequences (tuples and lists) as pivot values argument in PySpark.

2021-06-27 Thread GitBox
HyukjinKwon commented on pull request #33083: URL: https://github.com/apache/spark/pull/33083#issuecomment-869302812 @wrobell, can you file a JIRA (see https://spark.apache.org/contributing.html), and enable GitHub Actions in your fork repo (see https://github.com/apache/spark/pull/33083/

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33012: [SPARK-33298][CORE] Introduce new API to FileCommitProtocol allow flexible file naming

2021-06-27 Thread GitBox
HyukjinKwon commented on a change in pull request #33012: URL: https://github.com/apache/spark/pull/33012#discussion_r659441743 ## File path: core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala ## @@ -92,6 +92,35 @@ abstract class FileCommitProtocol exten

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33012: [SPARK-33298][CORE] Introduce new API to FileCommitProtocol allow flexible file naming

2021-06-27 Thread GitBox
HyukjinKwon commented on a change in pull request #33012: URL: https://github.com/apache/spark/pull/33012#discussion_r659440833 ## File path: core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala ## @@ -92,6 +92,35 @@ abstract class FileCommitProtocol exten

[GitHub] [spark] mridulm commented on a change in pull request #33034: WIP: [SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for push-based shuffle

2021-06-27 Thread GitBox
mridulm commented on a change in pull request #33034: URL: https://github.com/apache/spark/pull/33034#discussion_r659436631 ## File path: core/src/main/scala/org/apache/spark/Dependency.scala ## @@ -148,6 +153,18 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: ClassTag

[GitHub] [spark] viirya commented on a change in pull request #32980: [SPARK-35829][SQL] Clean up evaluates subexpressions and add more flexibility to evaluate particular subexpressoin

2021-06-27 Thread GitBox
viirya commented on a change in pull request #32980: URL: https://github.com/apache/spark/pull/32980#discussion_r659431764 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1049,17 +1095,25 @@ class CodegenCo

[GitHub] [spark] HyukjinKwon closed pull request #33054: [SPARK-35605][PYTHON] Move to_pandas_on_spark to the Spark DataFrame

2021-06-27 Thread GitBox
HyukjinKwon closed pull request #33054: URL: https://github.com/apache/spark/pull/33054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-un

[GitHub] [spark] mridulm commented on pull request #32286: [SPARK-35181][CORE] Use zstd for spark.io.compression.codec by default

2021-06-27 Thread GitBox
mridulm commented on pull request #32286: URL: https://github.com/apache/spark/pull/32286#issuecomment-869297415 Looks good to me (pending other reviews comments ofcourse). Why is this still draft btw ? Are we still testing this or waiting for other feedback/eval ? -- This is an autom

[GitHub] [spark] HyukjinKwon commented on pull request #33054: [SPARK-35605][PYTHON] Move to_pandas_on_spark to the Spark DataFrame

2021-06-27 Thread GitBox
HyukjinKwon commented on pull request #33054: URL: https://github.com/apache/spark/pull/33054#issuecomment-869297320 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] itholic commented on a change in pull request #33054: [SPARK-35605][PYTHON] Move to_pandas_on_spark to the Spark DataFrame

2021-06-27 Thread GitBox
itholic commented on a change in pull request #33054: URL: https://github.com/apache/spark/pull/33054#discussion_r659426716 ## File path: python/pyspark/pandas/plot/core.py ## @@ -20,7 +20,7 @@ import pandas as pd import numpy as np from pyspark.ml.feature import Bucketizer

[GitHub] [spark] itholic commented on a change in pull request #33054: [SPARK-35605][PYTHON] Move to_pandas_on_spark to the Spark DataFrame

2021-06-27 Thread GitBox
itholic commented on a change in pull request #33054: URL: https://github.com/apache/spark/pull/33054#discussion_r659426716 ## File path: python/pyspark/pandas/plot/core.py ## @@ -20,7 +20,7 @@ import pandas as pd import numpy as np from pyspark.ml.feature import Bucketizer

[GitHub] [spark] mridulm commented on a change in pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state

2021-06-27 Thread GitBox
mridulm commented on a change in pull request #33078: URL: https://github.com/apache/spark/pull/33078#discussion_r659421677 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -112,34 +116,48 @@ public ShuffleI

[GitHub] [spark] Shockang commented on a change in pull request #33101: [SPARK-35907][CORE] Instead of File#mkdirs, Files#createDirectories is expected.

2021-06-27 Thread GitBox
Shockang commented on a change in pull request #33101: URL: https://github.com/apache/spark/pull/33101#discussion_r659423768 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -310,15 +306,17 @@ private[spark] object Utils extends Logging { while (di

[GitHub] [spark] viirya commented on a change in pull request #32980: [SPARK-35829][SQL] Clean up evaluates subexpressions and add more flexibility to evaluate particular subexpressoin

2021-06-27 Thread GitBox
viirya commented on a change in pull request #32980: URL: https://github.com/apache/spark/pull/32980#discussion_r659425518 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1030,11 +1032,55 @@ class CodegenCo

[GitHub] [spark] HyukjinKwon commented on pull request #32867: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-06-27 Thread GitBox
HyukjinKwon commented on pull request #32867: URL: https://github.com/apache/spark/pull/32867#issuecomment-869288341 BTW, this change will likely affect many other vendors who maintain their forks so I will take a close look few more times. Thanks for bearing with me in advance .. ;-). -

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32867: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-06-27 Thread GitBox
HyukjinKwon commented on a change in pull request #32867: URL: https://github.com/apache/spark/pull/32867#discussion_r659423301 ## File path: python/pyspark/pandas/tests/test_stats.py ## @@ -31,6 +31,11 @@ from pyspark.testing.sqlutils import SQLTestUtils +# This is used i

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32867: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-06-27 Thread GitBox
HyukjinKwon commented on a change in pull request #32867: URL: https://github.com/apache/spark/pull/32867#discussion_r659423134 ## File path: dev/sparktestsupport/modules.py ## @@ -608,58 +594,19 @@ def __hash__(self): "pyspark.pandas.spark.accessors", "pyspar

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32867: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-06-27 Thread GitBox
HyukjinKwon commented on a change in pull request #32867: URL: https://github.com/apache/spark/pull/32867#discussion_r659422872 ## File path: dev/sparktestsupport/modules.py ## @@ -19,10 +19,67 @@ import itertools import os import re +import unittest +import sys + +from spar

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32867: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-06-27 Thread GitBox
HyukjinKwon commented on a change in pull request #32867: URL: https://github.com/apache/spark/pull/32867#discussion_r659422752 ## File path: dev/sparktestsupport/modules.py ## @@ -19,10 +19,67 @@ import itertools import os import re +import unittest +import sys + +from spar

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32867: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-06-27 Thread GitBox
HyukjinKwon commented on a change in pull request #32867: URL: https://github.com/apache/spark/pull/32867#discussion_r659422686 ## File path: dev/sparktestsupport/modules.py ## @@ -19,10 +19,67 @@ import itertools import os import re +import unittest +import sys + +from spar

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32867: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-06-27 Thread GitBox
HyukjinKwon commented on a change in pull request #32867: URL: https://github.com/apache/spark/pull/32867#discussion_r659422401 ## File path: dev/sparktestsupport/modules.py ## @@ -19,10 +19,67 @@ import itertools import os import re +import unittest +import sys + +from spar

[GitHub] [spark] cfmcgrady commented on a change in pull request #33106: [SPARK-35876][SQL] ArraysZip should retain field names to avoid being re-written by analyzer/optimizer

2021-06-27 Thread GitBox
cfmcgrady commented on a change in pull request #33106: URL: https://github.com/apache/spark/pull/33106#discussion_r659422135 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -181,16 +181,31 @@ case class MapK

[GitHub] [spark] Shockang commented on pull request #33101: [SPARK-35907][CORE] Instead of File#mkdirs, Files#createDirectories is expected.

2021-06-27 Thread GitBox
Shockang commented on pull request #33101: URL: https://github.com/apache/spark/pull/33101#issuecomment-869284074 > Thank you for your first contribution, @Shockang . I left a few comments. Thanks a lot.I will revise it according to your opinion. -- This is an automated message fro

[GitHub] [spark] sarutak commented on a change in pull request #32801: [SPARK-12567][SQL] Add aes_encrypt and aes_decrypt builtin functions

2021-06-27 Thread GitBox
sarutak commented on a change in pull request #32801: URL: https://github.com/apache/spark/pull/32801#discussion_r659413986 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -622,6 +622,8 @@ object FunctionRegistry {

[GitHub] [spark] sarutak commented on pull request #32801: [SPARK-12567][SQL] Add aes_encrypt and aes_decrypt builtin functions

2021-06-27 Thread GitBox
sarutak commented on pull request #32801: URL: https://github.com/apache/spark/pull/32801#issuecomment-869274420 @dongjoon-hyun Thank you for letting me know. I didn't notice the previous PR. I'll check it. -- This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [spark] sarutak commented on a change in pull request #33106: [SPARK-35876][SQL] ArraysZip should retain field names to avoid being re-written by analyzer/optimizer

2021-06-27 Thread GitBox
sarutak commented on a change in pull request #33106: URL: https://github.com/apache/spark/pull/33106#discussion_r659413386 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -181,16 +181,31 @@ case class MapKey

[GitHub] [spark] dgd-contributor commented on pull request #32916: [SPARK-35064][SQL] Group error in spark-catalyst

2021-06-27 Thread GitBox
dgd-contributor commented on pull request #32916: URL: https://github.com/apache/spark/pull/32916#issuecomment-869273136 @cloud-fan hi, can you check this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HeartSaVioR commented on pull request #33065: [SPARK-35880][SS] Track the duplicates dropped count in dedupe operator

2021-06-27 Thread GitBox
HeartSaVioR commented on pull request #33065: URL: https://github.com/apache/spark/pull/33065#issuecomment-869271919 Just pushed this PR branch to my fork. https://github.com/HeartSaVioR/spark/tree/pr/33065 GA will run here - https://github.com/HeartSaVioR/spark/runs/2927695669 I'

[GitHub] [spark] maropu commented on pull request #32973: Add missing GraphX classes to registerKryoClasses util method

2021-06-27 Thread GitBox
maropu commented on pull request #32973: URL: https://github.com/apache/spark/pull/32973#issuecomment-869268927 @matthewrj kingly ping; +1 for the @srowen comment. Could you file jira for this improvement? -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [spark] HeartSaVioR commented on pull request #33065: [SPARK-35880][SS] Track the duplicates dropped count in dedupe operator

2021-06-27 Thread GitBox
HeartSaVioR commented on pull request #33065: URL: https://github.com/apache/spark/pull/33065#issuecomment-869268505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [spark] maropu commented on pull request #32365: [SPARK-35228][SQL] Add expression ToPrettyString for keep consistent between hive/spark format in df.show and transform

2021-06-27 Thread GitBox
maropu commented on pull request #32365: URL: https://github.com/apache/spark/pull/32365#issuecomment-869266175 Yea, the consistent format looks fine to me. But, I think we need to keep the current format as we do via `spark.sql.legacy.castComplexTypesToString.enabled`. -- This is an au

[GitHub] [spark] HyukjinKwon commented on pull request #33107: [SPARK-35909][DOCS] Fix broken Python Links in docs/sql-getting-started.md

2021-06-27 Thread GitBox
HyukjinKwon commented on pull request #33107: URL: https://github.com/apache/spark/pull/33107#issuecomment-869262748 Thanks @dhruvildave! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] HyukjinKwon removed a comment on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-27 Thread GitBox
HyukjinKwon removed a comment on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-869258964 Thanks @aokolnychyi. I left few comments during the post-hoc review but moving to catalyst seems a good idea. -- This is an automated message from the Apache Git

[GitHub] [spark] HyukjinKwon commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-27 Thread GitBox
HyukjinKwon commented on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-869260409 Thanks @aokolnychyi. I left few comments during the post-hoc review but moving to catalyst seems a good idea. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-27 Thread GitBox
HyukjinKwon commented on a change in pull request #33096: URL: https://github.com/apache/spark/pull/33096#discussion_r659404980 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala ## @@ -0,0 +1,80 @@ +/* + * Licensed to t

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-27 Thread GitBox
HyukjinKwon commented on a change in pull request #33096: URL: https://github.com/apache/spark/pull/33096#discussion_r659404656 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala ## @@ -0,0 +1,80 @@ +/* + * Licensed to t

[GitHub] [spark] HyukjinKwon commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-27 Thread GitBox
HyukjinKwon commented on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-869258964 Thanks @aokolnychyi. I left few comments during the post-hoc review but moving to catalyst seems a good idea. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-27 Thread GitBox
HyukjinKwon commented on a change in pull request #33096: URL: https://github.com/apache/spark/pull/33096#discussion_r659404093 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala ## @@ -0,0 +1,80 @@ +/* + * Licensed to t

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-27 Thread GitBox
HyukjinKwon commented on a change in pull request #33096: URL: https://github.com/apache/spark/pull/33096#discussion_r659403923 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala ## @@ -0,0 +1,80 @@ +/* + * Licensed to t

[GitHub] [spark] maropu commented on a change in pull request #32980: [SPARK-35829][SQL] Clean up evaluates subexpressions and add more flexibility to evaluate particular subexpressoin

2021-06-27 Thread GitBox
maropu commented on a change in pull request #32980: URL: https://github.com/apache/spark/pull/32980#discussion_r659400977 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1049,17 +1095,25 @@ class CodegenCo

[GitHub] [spark] github-actions[bot] commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

2021-06-27 Thread GitBox
github-actions[bot] commented on pull request #31774: URL: https://github.com/apache/spark/pull/31774#issuecomment-869246328 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue ma

[GitHub] [spark] github-actions[bot] closed pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-06-27 Thread GitBox
github-actions[bot] closed pull request #31601: URL: https://github.com/apache/spark/pull/31601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: re

[GitHub] [spark] viirya commented on pull request #33112: [SPARK-35886][SQL][3.0] PromotePrecision should not overwrite genCodePromotePrecision should not overwrite genCode

2021-06-27 Thread GitBox
viirya commented on pull request #33112: URL: https://github.com/apache/spark/pull/33112#issuecomment-869243363 Thank you @dongjoon-hyun! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] viirya commented on pull request #33111: [SPARK-35886][SQL][3.1] PromotePrecision should not overwrite genCodePromotePrecision should not overwrite genCode

2021-06-27 Thread GitBox
viirya commented on pull request #33111: URL: https://github.com/apache/spark/pull/33111#issuecomment-869243309 Thank you @dongjoon-hyun! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] dongjoon-hyun closed pull request #33112: [SPARK-35886][SQL][3.0] PromotePrecision should not overwrite genCodePromotePrecision should not overwrite genCode

2021-06-27 Thread GitBox
dongjoon-hyun closed pull request #33112: URL: https://github.com/apache/spark/pull/33112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-

[GitHub] [spark] dongjoon-hyun commented on pull request #33112: [SPARK-35886][SQL][3.0] PromotePrecision should not overwrite genCodePromotePrecision should not overwrite genCode

2021-06-27 Thread GitBox
dongjoon-hyun commented on pull request #33112: URL: https://github.com/apache/spark/pull/33112#issuecomment-869234033 Merged to branch-3.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] dongjoon-hyun commented on pull request #33111: [SPARK-35886][SQL][3.1] PromotePrecision should not overwrite genCodePromotePrecision should not overwrite genCode

2021-06-27 Thread GitBox
dongjoon-hyun commented on pull request #33111: URL: https://github.com/apache/spark/pull/33111#issuecomment-869233883 Merged to branch-3.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] dongjoon-hyun closed pull request #33111: [SPARK-35886][SQL][3.1] PromotePrecision should not overwrite genCodePromotePrecision should not overwrite genCode

2021-06-27 Thread GitBox
dongjoon-hyun closed pull request #33111: URL: https://github.com/apache/spark/pull/33111 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-

[GitHub] [spark] AmplabJenkins commented on pull request #33110: [SPARK-35911] DPP: Update exprId for IN subquery

2021-06-27 Thread GitBox
AmplabJenkins commented on pull request #33110: URL: https://github.com/apache/spark/pull/33110#issuecomment-869218757 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33101: [SPARK-35907][CORE] Instead of File#mkdirs, Files#createDirectories is expected.

2021-06-27 Thread GitBox
dongjoon-hyun commented on a change in pull request #33101: URL: https://github.com/apache/spark/pull/33101#discussion_r659365987 ## File path: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ## @@ -477,6 +477,68 @@ class UtilsSuite extends SparkFunSuite with ResetS

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33101: [SPARK-35907][CORE] Instead of File#mkdirs, Files#createDirectories is expected.

2021-06-27 Thread GitBox
dongjoon-hyun commented on a change in pull request #33101: URL: https://github.com/apache/spark/pull/33101#discussion_r659365668 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -285,16 +285,12 @@ private[spark] object Utils extends Logging { */

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33101: [SPARK-35907][CORE] Instead of File#mkdirs, Files#createDirectories is expected.

2021-06-27 Thread GitBox
dongjoon-hyun commented on a change in pull request #33101: URL: https://github.com/apache/spark/pull/33101#discussion_r659365668 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -285,16 +285,12 @@ private[spark] object Utils extends Logging { */

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33101: [SPARK-35907][CORE] Instead of File#mkdirs, Files#createDirectories is expected.

2021-06-27 Thread GitBox
dongjoon-hyun commented on a change in pull request #33101: URL: https://github.com/apache/spark/pull/33101#discussion_r659365514 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -310,15 +306,17 @@ private[spark] object Utils extends Logging { whil

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33101: [SPARK-35907][CORE] Instead of File#mkdirs, Files#createDirectories is expected.

2021-06-27 Thread GitBox
dongjoon-hyun commented on a change in pull request #33101: URL: https://github.com/apache/spark/pull/33101#discussion_r659365220 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -310,15 +306,17 @@ private[spark] object Utils extends Logging { whil

  1   2   >