[GitHub] spark pull request #22078: [SPARK-25085][SQL] Table subdirectories should in...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22078#discussion_r228881996 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -70,7 +76,6 @@ case class

[GitHub] spark pull request #22078: [SPARK-25085][SQL] Table subdirectories should in...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22078#discussion_r228881824 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -261,4 +272,69 @@ case

[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22275 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22870: [SPARK-25862][SQL] Remove rangeBetween APIs introduced i...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22870 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22871: [SPARK-25179][PYTHON][DOCS] Document BinaryType support ...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22871 Thanks, @dongjoon-hyun and @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22530: [SPARK-24869][SQL] Fix SaveIntoDataSourceCommand's input...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22530 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22326 late LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22847#discussion_r228789484 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -812,6 +812,17 @@ object SQLConf { .intConf

[GitHub] spark pull request #22666: [SPARK-25672][SQL] schema_of_csv() - schema infer...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22666#discussion_r228787126 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala --- @@ -19,14 +19,39 @@ package

[GitHub] spark issue #22871: [SPARK-25179][PYTHON][DOCS] Document BinaryType support ...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22871 cc @BryanCutler and @gatorsmile. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #22871: [SPARK-25179][PYTHON][DOCS] Document BinaryType s...

2018-10-28 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/22871 [SPARK-25179][PYTHON][DOCS] Document BinaryType support in Arrow conversion ## What changes were proposed in this pull request? This PR targets to document binary type in "Apache

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21588 @dongjoon-hyun and @wangyum, please fix my comment if I am wrong at any point - I believe you guys took a look for this part more then I did

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21588 > Does this upgrade Hive for execution or also for metastore? Spark supports virtually all Hive metastore versions out there, and a lot of deployments do run different versions of Spark agai

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22868#discussion_r228776349 --- Diff: docs/sql-migration-guide-hive-compatibility.md --- @@ -51,6 +51,9 @@ Spark SQL supports the vast majority of Hive features, such as

[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22865#discussion_r228776300 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -462,7 +462,7 @@ object SQLConf { val

[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22858 Oops, mind fixing PR title too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22858 @cloud-fan, thanks for doing this backport! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22858 Merged to branch-2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22865#discussion_r228731568 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -462,7 +462,7 @@ object SQLConf { val

[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22865#discussion_r228731385 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -462,7 +462,7 @@ object SQLConf { val

[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22858 Yup, I think strictly we should change. Looks there are two occurrences at `udf` and `pands_udf` `isinstance(..., str)`. Another problem at PySpark is, inconsistent type comparison like

[GitHub] spark pull request #22858: [SPARK-24709][SQL][2.4] use str instead of basest...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22858#discussion_r228731178 --- Diff: python/pyspark/sql/functions.py --- @@ -2326,7 +2326,7 @@ def schema_of_json(json): >>> df.select(schema_of_json('{&

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21157 I meant to use https://github.com/apache/spark/blob/a97001d21757ae214c86371141bd78a376200f66/python/pyspark/serializers.py#L583 Instead of https://github.com/apache

[GitHub] spark pull request #22858: [SPARK-24709][SQL][2.4] use str instead of basest...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22858#discussion_r228713086 --- Diff: python/pyspark/sql/functions.py --- @@ -2326,7 +2326,7 @@ def schema_of_json(json): >>> df.select(schema_of_json('{&

[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22858 Wenchen, this is because ```python if sys.version >= '3': basestring = str ``` Is missing. Python 3 does not hav

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21157 Adding @gatorsmile and @cloud-fan as well since this might be potentially breaking changes for 3.0 release (it affects RDD operation only with namedtuple in certain case tho

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21157 And you can also run profiler to show the performance effect. See https://github.com/apache/spark/pull/19246#discussion_r139874732 to run the profile

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21157 You can just replace it to CloudPickler, remove changes at tests, and push that commit here to show no case is broken

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22666 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22666 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for R...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20503 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 Oh you mean the conflict fixing is not that hard. Thanks for doing this @cloud-fan. I planned to do this today

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21157 Yea, so to avoid to break, we could change the default pickler to CloudPickler or document this workaround. @superbobry, can you check if the case can be preserved if we use CloudPickler

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 Yea, but I meant a bit complicated but I'm okay in that way @cloud-fan. Thanks for doing that. I planed to do it today

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21588 > Hive 2.3 works with Hadoop 2.x (Hive 3.x works with Hadoop 3.x). This is essentially what we need for Hadoop 3 support [release-2.3.2|https://github.com/apache/hive/blob/rel/rele

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 Sure! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22850: [MINOR][DOC] Fix comment error of HiveUtils

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22850 Yea, I was aware of it. I think there are some more old comments in this file if I remember this correctly. Can you double check and fix them while we are here

[GitHub] spark issue #22850: [MINOR][DOC] Fix comment error of HiveUtils

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22850 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22775#discussion_r228520891 --- Diff: python/pyspark/sql/functions.py --- @@ -2365,30 +2365,32 @@ def to_json(col, options={}): @ignore_unicode_prefix @since(2.4

[GitHub] spark pull request #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22775#discussion_r228504453 --- Diff: python/pyspark/sql/functions.py --- @@ -2365,30 +2365,32 @@ def to_json(col, options={}): @ignore_unicode_prefix @since(2.4

[GitHub] spark issue #22771: [SPARK-25773][Core]Cancel zombie tasks in a result stage...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22771 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 Yup, yup .. I should sync the tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22814: [SPARK-25819][SQL] Support parse mode option for the fun...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22814 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21588 Yup, it supports Hadoop 3, and other fixes what @wangyum mentioned. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22814: [SPARK-25819][SQL] Support parse mode option for the fun...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22814 LGTM too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22814: [SPARK-25819][SQL] Support parse mode option for ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22814#discussion_r228381742 --- Diff: docs/sql-data-sources-avro.md --- @@ -177,6 +180,19 @@ Data source options of Avro can be set using the `.option` method on `DataFrameR

[GitHub] spark pull request #22814: [SPARK-25819][SQL] Support parse mode option for ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22814#discussion_r228380951 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala --- @@ -31,10 +32,32 @@ package object avro { * @since 2.4.0

[GitHub] spark pull request #22814: [SPARK-25819][SQL] Support parse mode option for ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22814#discussion_r228380639 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroFunctionsSuite.scala --- @@ -61,6 +59,24 @@ class AvroFunctionsSuite extends

[GitHub] spark issue #22827: [SPARK-25832][SQL][BRANCH-2.4] Revert newly added map re...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22827 LGTM too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs introduce...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22841 Looks good to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs in...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22841#discussion_r228376996 --- Diff: python/pyspark/sql/window.py --- @@ -239,34 +212,27 @@ def rangeBetween(self, start, end): and "5" means the five off

[GitHub] spark pull request #22815: [SPARK-25821][SQL] Remove SQLContext methods depr...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22815#discussion_r228376272 --- Diff: R/pkg/R/SQLContext.R --- @@ -434,6 +388,7 @@ read.orc <- function(path, ...) { #' Loads a Parquet file, returning the res

[GitHub] spark pull request #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs in...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22841#discussion_r228376015 --- Diff: python/pyspark/sql/window.py --- @@ -239,34 +212,27 @@ def rangeBetween(self, start, end): and "5" means the five off

[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16812 This can be easily worked around, no? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs introduce...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22841 Yup, I also agree with this revert. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 Maybe I am too much careful about it but I am kind of nervous about this column case. I don't intend to disallow it entirely but only for Spark 2.4. We might have to find a way to use c

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 Actually, that usecase can more easily accomplished by simply inferring schema by JSON datasource. Yea, I indeed suggested that as workaround for this issue before. Let's say, `spark.read

[GitHub] spark issue #22621: [SPARK-25602][SQL] SparkPlan.getByteArrayRdd should not ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22621 That's my point. Why do we have to document for fixing unexpected results fixed --- - To unsubscribe, e-mail: re

[GitHub] spark issue #22747: [SPARK-25760][SQL] Set AddJarCommand return empty

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22747 Yup, that's similar argument I had in https://github.com/apache/spark/pull/22773#issuecomment-432923361 I think we should clarify what to document

[GitHub] spark pull request #22814: [SPARK-25819][SQL] Support parse mode option for ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22814#discussion_r228113346 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -10,6 +10,9 @@ displayTitle: Spark SQL Upgrading Guide ## Upgrading From Spark SQL 2.4 to 3.0

[GitHub] spark pull request #22814: [SPARK-25819][SQL] Support parse mode option for ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22814#discussion_r228115771 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroFunctionsSuite.scala --- @@ -61,6 +59,24 @@ class AvroFunctionsSuite extends

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22814: [SPARK-25819][SQL] Support parse mode option for ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22814#discussion_r228065259 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala --- @@ -21,16 +21,31 @@ import org.apache.avro.Schema

[GitHub] spark issue #22621: [SPARK-25602][SQL] SparkPlan.getByteArrayRdd should not ...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22621 Let's say, this can be behaivour changes too since metrics are now changed. Should we update migration guide for s

[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22690 cc @cloud-fan and @gatorsmile Should we update migration guide as well? --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22503 @justinuang, this might affect existing users application. Although this matches the behaviour to non-miltiline mode, can we explicitly mention it in migration guide? cc @cloud-fan and

[GitHub] spark issue #22747: [SPARK-25760][SQL] Set AddJarCommand return empty

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22747 This looks also external changes to existing application users. Shall we update migration guide? --- - To unsubscribe, e

[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22773 Yup, will encourage to update the migration guide in that way. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22728: [SPARK-25736][SQL][TEST] add tests to verify the behavio...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22728 (From https://github.com/apache/spark/pull/22773#issuecomment-432917994) @gatorsmile and @cloud-fan, let's say this will break `DESCRIBE FUNCTION EXTENDED`. Should we update migration gui

[GitHub] spark issue #22815: [SPARK-25821][SQL] Remove SQLContext methods deprecated ...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22815 BTW, should we update migration guide too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22773 Sure, so for clarification, we will document everything that affects to external users application, right? --- - To

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 @cloud-fan, looks we are going to start another RC. Would you mind if I ask to take a quick look before the new RC

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22773 My impression so far was that we note things at migration notes when they are improvements (not bugs), and non-trivial and related to backward compatibility. Shall we clarify what to

[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22773 BTW, it's closer to bug rather then improvement tho. `from_json` should have default name `from_json` rather then `jsontostructs` - end users would have no idea why it's called `jso

[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22773 That's the exact issue I raised before and we ended up with not keeping the compatibility in column names. @cloud-fan and @hvanh

[GitHub] spark issue #22795: [SPARK-25798][PYTHON] Internally document type conversio...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22795 Thanks @viirya!! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #22816: [SPARK-25822][PySpark]Fix a race condition when r...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22816#discussion_r228012237 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -114,7 +114,7 @@ private[spark] abstract class BasePythonRunner[IN

[GitHub] spark pull request #22816: [SPARK-25822][PySpark]Fix a race condition when r...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22816#discussion_r228012035 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -114,7 +114,7 @@ private[spark] abstract class BasePythonRunner[IN

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22237 Thanks all!! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22795: [SPARK-25798][PYTHON] Internally document type conversio...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22795 Thanks, @BryanCutler. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 Adding it as a known issue sounds reasonable to me as well. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22807: [WIP][SPARK-25811][PySpark] Raise a proper error ...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22807#discussion_r227784077 --- Diff: python/pyspark/sql/tests.py --- @@ -4961,6 +4961,31 @@ def foofoo(x, y): ).collect ) +def

[GitHub] spark issue #22730: [SPARK-16775][CORE] Remove deprecated accumulator v1 API...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22730 adding @cloud-fan since accumulator version 2 was added by you. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22795: [SPARK-25798][PYTHON] Internally document type co...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22795#discussion_r227634476 --- Diff: python/pyspark/sql/functions.py --- @@ -3023,6 +3023,42 @@ def pandas_udf(f=None, returnType=None, functionType=None

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22237 https://github.com/apache/spark/pull/22237/files#r223707899 makes sense to me. Addressed. LGTM from my side as well

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 > According to the policy, we don't have to block the current release because of i @cloud-fan, BTW, would you mind if I ask to share what you read? I want to be aware of the p

[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22514 @cloud-fan, is this a performance regression that affects users that use Hive serde tables as well? --- - To unsubscribe, e

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 Wait wait .. do we only care about regressions as blockers for the last release (2.3)? I'm asking this because I really don't know. If so

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22237 Ah, yea I have a direct access to this branch. Let me just rebase/address the comment tomorrow. --- - To unsubscribe, e

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 For instance, https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/sketches-user/GmH4-OlHP9g/MW-J7Hg4BwAJ this discussion thread was started almost one year

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 It does try to fix a backward compatibility issue. It is found later now but still it is true we found a breaking issue to fix. Broken backward compatibility that potentially affects a set of

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22237 Oh wait you left a sign-off. Let me rebase it within tomorrow - wouldn't be a big job. --- - To unsubscribe, e

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22237 If @gengliangwang find some time to work on this, yea please go ahead. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r227367500 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AnyAgg.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed

[GitHub] spark issue #22803: [SPARK-25808][BUILD] Upgrade jsr305 version from 1.3.9 t...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22803 Yea, similar opinion. If it does not fix an actual problem, I wouldn't encourage to fix too .. --- - To unsubscri

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 If we were going for 3.0, then I would definitely leave +1 and I agree that we should rather focus on Spark itself as a higher priority - we should do that when we go 3.0 and rather drop such

[GitHub] spark pull request #22782: [HOTFIX] Fix PySpark pip packaging tests by non-a...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22782#discussion_r227336895 --- Diff: bin/docker-image-tool.sh --- @@ -79,7 +79,7 @@ function build { fi # Verify that Spark has actually been built/is a

[GitHub] spark pull request #22782: [HOTFIX] Fix PySpark pip packaging tests by non-a...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22782#discussion_r227336318 --- Diff: bin/docker-image-tool.sh --- @@ -79,7 +79,7 @@ function build { fi # Verify that Spark has actually been built/is a

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22666 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

<    3   4   5   6   7   8   9   10   11   12   >