[GitHub] spark pull request #22399: [SPARK-25408] Move to mode ideomatic Java8

2018-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22399#discussion_r216928860 --- Diff: common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleSecuritySuite.java --- @@ -96,14 +96,14 @@ private void

[GitHub] spark issue #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for...

2018-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22400 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warni...

2018-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22400#discussion_r216920296 --- Diff: python/pyspark/sql/streaming.py --- @@ -565,7 +565,7 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non

[GitHub] spark pull request #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warni...

2018-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22400#discussion_r216918858 --- Diff: dev/run-tests-jenkins.py --- @@ -115,7 +115,7 @@ def run_tests(tests_timeout): os.path.join

[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...

2018-09-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22394 Hey @mallman, let's just target to fix the problem in the JIRA without other refactorings. --- - To unsubscribe, e

[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....

2018-09-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22394#discussion_r216903678 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -245,28 +249,32 @@ class

[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....

2018-09-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22394#discussion_r216903560 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -245,28 +249,32 @@ class

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22357 LGTM from me too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCode...

2018-09-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22358#discussion_r216881788 --- Diff: docs/sql-programming-guide.md --- @@ -965,6 +965,8 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession

[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...

2018-09-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22358 I'm okay but I would close this if no committer agree with (approves) this for some long time. --- - To unsubscribe, e

[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCode...

2018-09-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22358#discussion_r216657064 --- Diff: docs/sql-programming-guide.md --- @@ -965,6 +965,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession

[GitHub] spark issue #22389: [SPARK-17916][SPARK-25241][SQL][FOLLOW-UP] Fix empty str...

2018-09-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22389 Merged to master and branch-2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #22213: [SPARK-25221][DEPLOY] Consistent trailing whitespace tre...

2018-09-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22213 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts

2018-09-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22385 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22357 Can anyone point me out if there are non addressed comments or problems here? Looks pretty good to me. I think this is rather a bandaid, small and safe fix to get into branch-2.4

[GitHub] spark issue #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix empty stri...

2018-09-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22367 Usually we merge into master and backport to other branches when it's needed. https://spark.apache.org/contributing.html > 5. Open a pull request against the master b

[GitHub] spark issue #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix empty stri...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22367 Oh, wait. why does this target branch-2.4? You can open this against master and backport if it's needed --- ---

[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22234 @mmolimar, let's leave this closed since the newer one is open BTW. You will be credited as a primary author of #22367 a

[GitHub] spark pull request #21654: [SPARK-24671][PySpark] DataFrame length using a d...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21654#discussion_r216554397 --- Diff: python/pyspark/sql/dataframe.py --- @@ -375,6 +375,9 @@ def _truncate(self): return int(self.sql_ctx.getConf

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r216553283 --- Diff: python/pyspark/sql/session.py --- @@ -252,6 +253,22 @@ def newSession(self): """ return self.__c

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/7 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21596 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22379#discussion_r216510114 --- Diff: sql/catalyst/pom.xml --- @@ -103,6 +103,12 @@ commons-codec commons-codec + + com.univocity

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22379#discussion_r216509108 --- Diff: R/pkg/R/functions.R --- @@ -3720,3 +3720,22 @@ setMethod("current_timestamp", jc <

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/7 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21654: [SPARK-24671][PySpark] DataFrame length using a d...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21654#discussion_r216385714 --- Diff: python/pyspark/sql/dataframe.py --- @@ -375,6 +375,9 @@ def _truncate(self): return int(self.sql_ctx.getConf

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22357 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22237 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22237 Will take a look soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22316 Seems fine to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22365#discussion_r216233575 --- Diff: python/pyspark/sql/dataframe.py --- @@ -880,18 +880,23 @@ def sampleBy(self, col, fractions, seed=None): | 0|5

[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22365#discussion_r216233066 --- Diff: python/pyspark/sql/dataframe.py --- @@ -880,18 +880,23 @@ def sampleBy(self, col, fractions, seed=None): | 0|5

[GitHub] spark issue #22373: [SPARK-25371][ML] VectorAssembler should not fail with e...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22373 @mgaido91, BTW are you sure SPARK-21281 introduced that behaviour change? Before: ``` scala> import org.apache.spark.sql.functions.struct imp

[GitHub] spark pull request #22370: don't link to deprecated function

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22370#discussion_r216229836 --- Diff: R/pkg/R/catalog.R --- @@ -69,7 +69,6 @@ createExternalTable <- function(x, ...) { #' @param ... additional named parameters as

[GitHub] spark issue #22378: [SPARK-25389][SQL] INSERT OVERWRITE DIRECTORY STORED AS ...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22378 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216218951 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -156,7 +161,7 @@ private[sql

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216218409 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -196,6 +201,9 @@ private[sql

[GitHub] spark pull request #22370: don't link to deprecated function

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22370#discussion_r216217235 --- Diff: R/pkg/R/catalog.R --- @@ -69,7 +69,6 @@ createExternalTable <- function(x, ...) { #' @param ... additional named parameters as

[GitHub] spark issue #22377: [SPARK-24849][SPARK-24911][SQL][FOLLOW-UP] Converting a ...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22377 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22343: [SPARK-25391][SQL] Make behaviors consistent when...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22343#discussion_r216216422 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala --- @@ -69,12 +69,25 @@ class ParquetOptions

[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18142 Yea, I didn't mean it super seriously @cloud-fan - I just left a comment in case for a better documentation since I see many users go from Hive to

[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18142 I mean https://spark.apache.org/docs/latest/sql-programming-guide.html#supported-hive-features and https://spark.apache.org/docs/latest/sql-programming-guide.html#unsupported-hive

[GitHub] spark pull request #22372: [SPARK-25385][BUILD] Upgrade Hadoop 3.1 jackson v...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22372#discussion_r216203140 --- Diff: pom.xml --- @@ -2694,6 +2694,8 @@ 3.1.0 2.12.0 3.4.9 +2.7.8 + 2.7.8 --- End

[GitHub] spark issue #22372: [SPARK-25385][BUILD] Upgrade Hadoop 3.1 jackson version ...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22372 Also, I think we should fix https://github.com/apache/spark/pull/21588 first. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22372: [SPARK-25385][BUILD] Upgrade Hadoop 3.1 jackson version ...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22372 Also, IIRC, https://github.com/apache/spark/pull/21596 change is needed for Jackson upgrade. --- - To unsubscribe, e-mail

[GitHub] spark issue #22372: [SPARK-25385][BUILD] Upgrade Hadoop 3.1 jackson version ...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22372 @wangyum, are you doubly sure if still SBT uses the Jackson? I roughly tried this a bit of while ago and found SBT doesn't pick up Ma

[GitHub] spark issue #22366: [SPARK-25384][SQL] Removing of spark.sql.fromJsonForceNu...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22366 Yea, let's note so that we can track what we change. cc @gatorsmile as well --- - To unsubscribe, e-mail: reviews-uns

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216200358 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -17,7 +17,7

[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r216199952 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1729,10 +1730,8 @@ class DataFrameSuite extends QueryTest with

[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22347 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22347 Let me leave this ok to test since there looks a progress here anyway. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22343: [SPARK-25132][SQL][FOLLOW-UP] The behavior must b...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22343#discussion_r216198315 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala --- @@ -69,12 +69,25 @@ class ParquetOptions

[GitHub] spark issue #22343: [SPARK-25132][SQL][FOLLOW-UP] The behavior must be consi...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22343 @seancxmao, mind fixing the PR title BTW? For instance, looks unclear which behaviour you mean in the PR title. --- - To

[GitHub] spark pull request #22377: [SPARK-24849][SPARK-24911][SQL][FOLLOW-UP] Conver...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22377#discussion_r216197639 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala --- @@ -441,6 +443,8 @@ object StructType extends AbstractDataType

[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18142 One explicit problem here is, we claim Hive compatibility in Spark. The difference should be explained when we are clear on this

[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18142 > This clearly violates the SQL semantic: the string inside backticks should be treated as a string literal. BTW, I believe there's no particular standard for backticks th

[GitHub] spark pull request #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix emp...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22367#discussion_r216196589 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -79,7 +79,8 @@ private[csv] object

[GitHub] spark pull request #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix emp...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22367#discussion_r216196505 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -91,9 +91,10 @@ abstract class

[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18142 Yea that was my impression as well. Let me bring this back when we're clear if this is a bug or not. --- - To unsubs

[GitHub] spark pull request #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix emp...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22367#discussion_r216186604 --- Diff: docs/sql-programming-guide.md --- @@ -1897,6 +1897,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see - In

[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18142 @cloud-fan, should we update migration guide as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix empty stri...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22367 @MaxGekk, mind adding `Closes #22234` at the end of PR description so that we can automatically close that one? --- - To

[GitHub] spark pull request #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix emp...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22367#discussion_r216185993 --- Diff: docs/sql-programming-guide.md --- @@ -1897,6 +1897,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see - In

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r216185688 --- Diff: R/pkg/R/functions.R --- @@ -3404,19 +3404,24 @@ setMethod("collect_set", #' Equivalent to \code{spli

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r216185526 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2546,15 +2546,39 @@ object functions { def soundex(e: Column

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r216185520 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2546,15 +2546,39 @@ object functions { def soundex(e: Column

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r216185422 --- Diff: python/pyspark/sql/functions.py --- @@ -1671,18 +1671,32 @@ def repeat(col, n): @since(1.5) @ignore_unicode_prefix -def

[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21649#discussion_r216185288 --- Diff: R/pkg/R/DataFrame.R --- @@ -3905,6 +3905,16 @@ setMethod("rollup", group

[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCode...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22358#discussion_r216185045 --- Diff: docs/sql-programming-guide.md --- @@ -964,7 +964,8 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession

[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22358 I am 0 on this since it is worth`Class org.apache.hadoop.io.compress.XXXCodec was not found` error message vs `need install ... ` message

[GitHub] spark issue #22369: [SPARK-25072][DOC] Update migration guide for behavior c...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22369 @xuanyuanking, no need to rush. Let's wait and discuss a bit more before proposing a change. --- - To unsubscribe, e

[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22140 Yea, actually I wouldn't at least backport this to branch-2.3 since the release is very close. Looks a bug to me as well. One nitpicking is the case with RDD oper

[GitHub] spark issue #22213: [SPARK-25221][DEPLOY] Consistent trailing whitespace tre...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22213 Seems fine to me too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22213#discussion_r216180051 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2052,6 +2051,30 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r216179901 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1729,10 +1730,8 @@ class DataFrameSuite extends QueryTest with

[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r216179499 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -85,14 +85,16 @@ class DataFrameFunctionsSuite extends

[GitHub] spark issue #21180: [SPARK-22674][PYTHON] Disabled _hack_namedtuple for pick...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21180 Master please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22213: [SPARK-25221][DEPLOY] Consistent trailing whitespace tre...

2018-09-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22213 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22316#discussion_r216122957 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -330,6 +331,15 @@ class RelationalGroupedDataset protected

[GitHub] spark pull request #21654: [SPARK-24671][PySpark] DataFrame length using a d...

2018-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21654#discussion_r216121454 --- Diff: python/pyspark/sql/dataframe.py --- @@ -375,6 +375,9 @@ def _truncate(self): return int(self.sql_ctx.getConf

[GitHub] spark issue #21654: [SPARK-24671][PySpark] DataFrame length using a dunder/m...

2018-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21654 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21180: [SPARK-22674][PYTHON] Disabled _hack_namedtuple for pick...

2018-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21180 @superbobry, sorry for back and forth. branch-2.4 is cut out. Can we open another PR that removes the hack? Let's make sure to leave the potential impact, workaround and strong justific

[GitHub] spark issue #22351: [MINOR][SQL] Add a debug log when a SQL text is used for...

2018-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22351 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22349: [SPARK-25345][ML] Deprecate public APIs from Imag...

2018-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22349#discussion_r216120943 --- Diff: python/pyspark/ml/image.py --- @@ -30,6 +30,7 @@ from pyspark import SparkContext from pyspark.sql.types import Row, _create_row

[GitHub] spark pull request #22349: [SPARK-25345][ML] Deprecate public APIs from Imag...

2018-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22349#discussion_r216120953 --- Diff: python/pyspark/ml/image.py --- @@ -222,7 +226,8 @@ def readImages(self, path, recursive=False, numPartitions=-1

[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...

2018-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22358 If the codecs are found, then we support it. One thing we should do might be to document to explicitly provide the codec but I am not sure how many users are confused about it

[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCode...

2018-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22358#discussion_r216120854 --- Diff: docs/sql-programming-guide.md --- @@ -964,7 +964,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession

[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...

2018-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22234 Oh no I mean we fixed a bug.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...

2018-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22358 That's probably something we should document, or improve the error message. Ideally, we should fix the error message from Parquet. Don't

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r215877729 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -196,6 +196,7 @@ private[sql

[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...

2018-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22358 but if there are the codecs found, we support those compressions, no? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22357 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r215866501 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -196,6 +196,7 @@ private[sql

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attib...

2018-09-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r215862215 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1202,15 +1222,50 @@ object

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attib...

2018-09-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r215861685 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1149,6 +1149,26 @@ object

[GitHub] spark issue #22213: [SPARK-25221][DEPLOY] Consistent trailing whitespace tre...

2018-09-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22213 adding @vanzin as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22213: [SPARK-25221][DEPLOY] Consistent trailing whitespace tre...

2018-09-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22213 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22351: [MINOR][SQL] Add a debug log when a SQL text is used for...

2018-09-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22351 Done, thanks @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r215857463 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -196,6 +196,7 @@ private[sql

[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to transp...

2018-09-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r215843736 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -726,14 +726,36 @@ object CollapseRepartition

<    11   12   13   14   15   16   17   18   19   20   >