[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...

2018-11-29 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/23152#discussion_r237719791 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -879,13 +879,13 @@ case

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/21879 @cloud-fan Didn't try to actually reproduce this issue in branches other than branch-2.3, but just by checking the POM files, this issue existed ever since at least 1.6

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/21879 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/21879 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/21879 The Kafka DStream test failures seem to be flaky and irrelevant. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21879: [SPARK-24927] The scope of snappy-java cannot be ...

2018-07-26 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/21879 [SPARK-24927] The scope of snappy-java cannot be "provided" Please see [SPARK-24927][1] for more details. [1]: https://issues.apache.org/jira/browse/SPARK-24927

[GitHub] spark issue #21865: [SPARK-24895] Remove spotbugs plugin

2018-07-24 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/21865 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21865: [SPARK-24895] Remove spotbugs plugin

2018-07-24 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/21865 add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20174: [SPARK-22951][SQL] fix aggregation after dropDuplicates ...

2018-01-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/20174 LGTM, merging to master and branch-2.3. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #20174: [SPARK-22951][SQL] fix aggregation after dropDupl...

2018-01-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/20174#discussion_r160770992 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala --- @@ -666,4 +665,16 @@ class DataFrameAggregateSuite extends

[GitHub] spark issue #20174: [SPARK-22951][SQL] fix aggregation after dropDuplicates ...

2018-01-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/20174 @mgaido91 We can't because we do not know whether there are any input rows or not. For example: ```scala val df1 = spark.range(10).select() val df2 = spark.range(10).filter($&qu

[GitHub] spark pull request #20174: [SPARK-22951][SQL] fix aggregation after dropDupl...

2018-01-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/20174#discussion_r160612592 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1221,7 +1221,12 @@ object

[GitHub] spark pull request #20174: [SPARK-22951][SQL] fix aggregation after dropDupl...

2018-01-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/20174#discussion_r160611832 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1221,7 +1221,12 @@ object

[GitHub] spark pull request #20174: [SPARK-22951][SQL] fix aggregation after dropDupl...

2018-01-09 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/20174#discussion_r160602821 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala --- @@ -19,6 +19,8 @@ package org.apache.spark.sql import

[GitHub] spark issue #20174: [SPARK-22951][SQL] fix aggregation after dropDuplicates ...

2018-01-09 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/20174 @liufengdb I wrote a summary according to our offline discussion to explain the subtle change made in this PR. Please feel free to use it in the PR description if it looks good to you

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/19439 @jkbradley I'm not confident enough about this part but a quick check suggested that typically `PathFilter`s are used in `FileInputFormat.listStatus()`, which is usually called

[GitHub] spark issue #19386: [SPARK-22161] [SQL] Add Impala-modified TPC-DS queries

2017-09-29 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/19386 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19361: [SPARK-22140] Add TPCDSQuerySuite

2017-09-27 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/19361#discussion_r141501351 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala --- @@ -0,0 +1,348 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19080: [SPARK-21865][SQL] simplify the distribution sema...

2017-08-30 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/19080#discussion_r136243745 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -30,18 +30,43 @@ import

[GitHub] spark pull request #19080: [SPARK-21865][SQL] simplify the distribution sema...

2017-08-30 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/19080#discussion_r136211685 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -30,18 +30,43 @@ import

[GitHub] spark issue #18712: [SPARK-17528][SQL][followup] remove unnecessary data cop...

2017-07-24 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/18712 Nice, didn't know that the copy issue has already been fixed. LGTM, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #17419: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-07-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/17419 @WeichenXu123 and I did some profiling using `jvisualvm` and found that 40% of the time is spent in the copy performed by [this `safeProjection`][1]. This is a known issue used to fight against

[GitHub] spark issue #18470: [SPARK-21258][SQL] Fix WindowExec complex object aggrega...

2017-06-29 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/18470 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #18412: [SPARK-21203] [SQL] Fix wrong results of insertio...

2017-06-26 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/18412#discussion_r124062223 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -482,15 +482,15 @@ case class Cast(child: Expression

[GitHub] spark pull request #18181: [SPARK-20958][SQL] Roll back parquet-mr 1.8.2 to ...

2017-06-02 Thread liancheng
Github user liancheng closed the pull request at: https://github.com/apache/spark/pull/18181 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #18181: [SPARK-20958][SQL] Roll back parquet-mr 1.8.2 to 1.8.1

2017-06-02 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/18181 @rdblue @mallman Thanks for the comments! As mentioned in the JIRA ticket, we've decided to preserve parquet-mr 1.8.2. Instead, we'll add a release notes entry to suggest using parquet-avro 1.8.1

[GitHub] spark issue #18181: [SPARK-20958][SQL] Roll back parquet-mr 1.8.2 to 1.8.1

2017-06-02 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/18181 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18181: [SPARK-20958][SQL] Roll back parquet-mr 1.8.2 to 1.8.1

2017-06-02 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/18181 Unfortunately, rolling back parquet-mr to 1.8.1 brings back [PARQUET-389][1], which breaks multiple test cases involving schema evolution (add a new column to a Parquet table and filter

[GitHub] spark issue #18181: [SPARK-20958][SQL] Roll back parquet-mr 1.8.2 to 1.8.1

2017-06-02 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/18181 @viirya Thanks for reminding! I'm reverting that one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18181: [SPARK-20958][SQL] Roll back parquet-mr 1.8.2 to 1.8.1

2017-06-02 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/18181 @dongjoon-hyun I already reverted PR #16751 manually but forgot to mention it in the PR description. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #18181: [SPARK-20958][SQL] Roll back parquet-mr 1.8.2 to ...

2017-06-01 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/18181 [SPARK-20958][SQL] Roll back parquet-mr 1.8.2 to 1.8.1 ## What changes were proposed in this pull request? This PR reverts PR #16791, #16817, and part of #16795 to roll back parquet-mr

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-05-05 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/17469 Also cherry-picked this to branch-2.2. cc @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #17693: [SPARK-20314][SQL] Inconsistent error handling in JSON p...

2017-04-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/17693 Not suggesting doing it in this PR but maybe adding a SQL option to let the users choose the error handling strategy of all the JSON functions probably makes more sense here? The Spark JSON data

[GitHub] spark issue #17398: [SPARK-19716][SQL] support by-name resolution for struct...

2017-04-04 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/17398 LGTM. Merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17454: [SPARK-20125][SQL] Dataset of type option of map does no...

2017-03-28 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/17454 OK, resolved the conflict manually and merged to branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17454: [SPARK-20125][SQL] Dataset of type option of map does no...

2017-03-28 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/17454 This PR conflicts with branch-2.1, trying to resolve manually while merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #17454: [SPARK-20125][SQL] Dataset of type option of map does no...

2017-03-28 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/17454 LGTM, merging to master and branch-2.1. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16917: [SPARK-19529][BRANCH-1.6] Backport PR #16866 to b...

2017-03-13 Thread liancheng
Github user liancheng closed the pull request at: https://github.com/apache/spark/pull/16917 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #17247: [SPARK-19905][SQL] Bring back Dataset.inputFiles ...

2017-03-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/17247#discussion_r105489225 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1865,4 +1865,12 @@ class HiveDDLSuite

[GitHub] spark pull request #17247: [SPARK-19905][SQL] Bring back Dataset.inputFiles ...

2017-03-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/17247#discussion_r105475257 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2734,6 +2735,8 @@ class Dataset[T] private[sql

[GitHub] spark issue #17247: [SPARK-19905][SQL] Bring back Dataset.inputFiles for Hiv...

2017-03-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/17247 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17247: [SPARK-19905][SQL] Bring back Dataset.inputFiles ...

2017-03-10 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/17247 [SPARK-19905][SQL] Bring back Dataset.inputFiles for Hive SerDe tables ## What changes were proposed in this pull request? `Dataset.inputFiles` works by matching `FileRelation`s

[GitHub] spark issue #17168: [SPARK-19737][SQL] New analysis rule for reporting unreg...

2017-03-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/17168 Thanks for the review! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17168: [SPARK-19737][SQL] New analysis rule for reporting unreg...

2017-03-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/17168 @cloud-fan I'm afraid that's even more complicated an approach since you'll need to make function builders, which are now simply regular Scala functions, into unresolved expressions

[GitHub] spark issue #17168: [SPARK-19737][SQL] New analysis rule for reporting unreg...

2017-03-05 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/17168 cc @cloud-fan @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #17168: [SPARK-19737][SQL] New analysis rule for reportin...

2017-03-05 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/17168 [SPARK-19737][SQL] New analysis rule for reporting unregistered functions without relying on relation resolution ## What changes were proposed in this pull request? This PR adds a new

[GitHub] spark issue #16935: [SPARK-19604] [TESTS] Log the start of every Python test

2017-02-15 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16935 Merging to master an branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16917: [SPARK-19529][BRANCH-1.6] Backport PR #16866 to branch-1...

2017-02-14 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16917 cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16866: [SPARK-19529] TransportClientFactory.createClient() shou...

2017-02-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16866 Merged to master, branch-2.1, and branch-2.0. Files involved in branch-1.6 were moved to new directories and made it hard to cherry-pick. Created PR #16917 to backport this one to 1.6

[GitHub] spark pull request #16917: [SPARK-19529][BRANCH-1.6] Backport PR #16866 to b...

2017-02-13 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/16917 [SPARK-19529][BRANCH-1.6] Backport PR #16866 to branch-1.6 ## What changes were proposed in this pull request? This PR backports PR #16866 to branch-1.6 ## How was this patch

[GitHub] spark issue #16866: [SPARK-19529] TransportClientFactory.createClient() shou...

2017-02-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16866 Branch-2.1 test compilation happens to be broken right now. Trying to fix the compilation failure first. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16866: [SPARK-19529] TransportClientFactory.createClient() shou...

2017-02-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16866 Merging to master, branch-2.1, branch-2.0, and branch-1.6. Conflicts occurred, I'm fixing them manually. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16161: [SPARK-18717][SQL] Make code generation for Scala Map wo...

2017-02-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16161 Thanks. Backported to branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16161: [SPARK-18717][SQL] Make code generation for Scala Map wo...

2017-02-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16161 Shall we backport this to branch-2.1? I'd consider this as a bug because of the following snippet fail in Spark 2.1: ```scala case class Wrapper1(value: Option[Map[String, String

[GitHub] spark issue #16795: [SPARK-19409][BUILD] Fix ParquetAvroCompatibilitySuite f...

2017-02-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16795 @dongjoon-hyun, you may add `[TEST-MAVEN]` in the PR title to ask Jenkins to test this PR using Maven. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #16791: [SPARK-19409][SPARK-17213] Cleanup Parquet workarounds/h...

2017-02-06 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16791 @HyukjinKwon Sorry that I didn't see your comment before this PR got merged. I believe PARQUET-686 had already been fixed by apache/parquet-mr#367 but wasn't marked as resolved in JIRA. Thanks

[GitHub] spark issue #16791: [SPARK-19409][SPARK-17213] Cleanup Parquet workarounds/h...

2017-02-03 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16791 I see, thanks for the context. But I'd like to keep this Maven build failure fix in a separate PR so that people can easily cherry-pick the fix. Also, it helps to keep this PR easier to follow

[GitHub] spark issue #16791: [SPARK-19409][SPARK-17213] Cleanup Parquet workarounds/h...

2017-02-03 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16791 Hope we finally have proper Parquet filter push-down for string/binary columns (cross fingers)! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16791: [SPARK-19409][SPARK-17213] Cleanup Parquet workarounds/h...

2017-02-03 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16791 @dongjoon-hyun Actually, could you please point me the Maven build failure? I don't think this failure is caused by this PR, is it? Are you refering to some existing PR introduced by some earlier

[GitHub] spark issue #16791: [SPARK-19409][SPARK-17213] Cleanup Parquet workarounds/h...

2017-02-03 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16791 @dongjoon-hyun Ah, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16791: [SPARK-19409][SPARK-17213] Cleanup Parquet workarounds/h...

2017-02-03 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16791 cc @cloud-fan @rxin @HyukjinKwon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16791: [SPARK-19409][SPARK-17213] Cleanup Parquet workar...

2017-02-03 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/16791 [SPARK-19409][SPARK-17213] Cleanup Parquet workarounds/hacks due to bugs of old Parquet versions ## What changes were proposed in this pull request? We've already upgraded parquet-mr

[GitHub] spark issue #16649: [SPARK-19295] [SQL] IsolatedClientLoader's downloadVersi...

2017-01-19 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16649 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16564: [SPARK-19065][SS]Rewrite Alias in StreamExecution if nec...

2017-01-12 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16564 LGTM as long as we decide to preserve #15427. The root cause of this issue is still the `df("col")` syntax, which is the motivation behind #15427. We decided not to deprec

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-30 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16424 OK, I'm merging this to master and branch-2.1. Thanks for the review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16424 @ericl @CodingCat Thanks for the review! Fixed per your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #16424: [SPARK-19016][SQL][DOC] Document scalable partiti...

2016-12-29 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/16424#discussion_r94181109 --- Diff: docs/sql-programming-guide.md --- @@ -526,11 +526,18 @@ By default `saveAsTable` will create a "managed table", meaning that t

[GitHub] spark pull request #16424: [SPARK-19016][SQL][DOC] Document scalable partiti...

2016-12-28 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/16424 [SPARK-19016][SQL][DOC] Document scalable partition handling ## What changes were proposed in this pull request? This PR documents the scalable partition handling feature in the body

[GitHub] spark issue #16184: [SPARK-18753][SQL] Keep pushed-down null literal as a fi...

2016-12-14 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16184 LGTM, thanks for working on this! I think the problem is that we generate a foldable predicate (`lit(null)`) during optimization phase but fail to fold it. Ideally, the optimizer should

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-14 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16030 Another thing that @cloud-fan and I agreed upon is that our current `DataFrameReader.schema()` interface method is insufficient. We may want to add - `DataFrameReader.dataSchema

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16030 @maropu @brkyvz Sorry for the delay, I was blocked by some other tasks during the last a few days. @cloud-fan and I just did some investigation and we think we came up with a minimal fix

[GitHub] spark issue #16163: [SPARK-18730] Post Jenkins test report page instead of t...

2016-12-05 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16163 @srowen Thanks. I sent this one because the `consoleFull` page frequently freezes my browser these days, not mentioning viewing Jenkins build results via mobile phone... --- If your project

[GitHub] spark pull request #16163: [SPARK-18730] Post Jenkins test report page inste...

2016-12-05 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/16163 [SPARK-18730] Post Jenkins test report page instead of the full console output page to GitHub ## What changes were proposed in this pull request? Currently, the full console output page

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-05 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16030 I'm checking whether the original behavior is consistent (do we always respect data schema column order when partition columns are included in data schema?). If not, I call it a bug and we just

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-05 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16030 @brkyvz I agree that always moving all partitioned columns to the end of the schema is more consistent and intuitive. However, users may have ordinal-dependent code like this: ```scala

[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16156 Hey @xwu0226 @gatorsmile, did some investigation, and I don't think this is a bug now. Please refer to [my JIRA comment][1] for more details. [1]: https://issues.apache.org/jira/browse

[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16156 @xwu0226 Just tested that this issue also affects the normal Parquet reader (by setting `spark.sql.parquet.enableVectorizedReader` to `false`). That's also why #9940 couldn't take a similar

[GitHub] spark pull request #16156: [SPARK-18539][SQL]: tolerate pushed-down filter o...

2016-12-05 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/16156#discussion_r90973620 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java --- @@ -107,7 +107,16 @@ public

[GitHub] spark pull request #16156: [SPARK-18539][SQL]: tolerate pushed-down filter o...

2016-12-05 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/16156#discussion_r90972713 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -578,4 +578,66 @@ class

[GitHub] spark pull request #16156: [SPARK-18539][SQL]: tolerate pushed-down filter o...

2016-12-05 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/16156#discussion_r90972121 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java --- @@ -107,7 +107,16 @@ public

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-05 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16030 @brkyvz I also worry about the behavior change. Let me check whether the original behavior is by design or by accident. If it is a bug from the very beginning, then we should just fix

[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16156 BTW, I think this PR is a cleaner fix than #9940, which introduces a temporary metadata while merging two `StructType`s and erased it in a later phase. We may want to remove the hack done

[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16156 Actually, PR #9940 should have already fixed this issue. I'm checking why it doesn't work under 2.0.1 for 2.0.2. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #16156: [SPARK-18539][SQL]: tolerate pushed-down filter o...

2016-12-05 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/16156#discussion_r90969603 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -578,4 +578,66 @@ class

[GitHub] spark pull request #16156: [SPARK-18539][SQL]: tolerate pushed-down filter o...

2016-12-05 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/16156#discussion_r90968974 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -578,4 +578,66 @@ class

[GitHub] spark pull request #16156: [SPARK-18539][SQL]: tolerate pushed-down filter o...

2016-12-05 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/16156#discussion_r90968113 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java --- @@ -107,7 +107,16 @@ public

[GitHub] spark issue #16106: [SPARK-17213][SQL] Disable Parquet filter push-down for ...

2016-12-01 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16106 The last build failure doesn't seem to be relevant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16106: [SPARK-17213][SQL] Disable Parquet filter push-down for ...

2016-12-01 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16106 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16106: [SPARK-17213][SQL] Disable Parquet filter push-do...

2016-12-01 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/16106 [SPARK-17213][SQL] Disable Parquet filter push-down for string and binary columns due to PARQUET-686 ## What changes were proposed in this pull request? Due to PARQUET-686, Parquet

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-30 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16030 My hunch is that we somehow passed a wrong requested schema containing the partition column down to the vectorized Parquet reader. IIRC, we prune partition columns from the data schema when

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-30 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16030 @maropu I tried your snippet (with minor modifications). It works for 1.6.0 instead of 2.0.2: ```scala case class A(a: Long, b: Int) val as = Seq(A(1, 2)) val path = "

[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-30 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15979 Also backported to branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-30 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15979 @rxin Shall we backport this to branch-2.1? I think it's relatively safe. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-30 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15979 Merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-30 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16030 @brkyvz @maropu Actually, we do allow users to create partitioned tables that allow data schema to contains (part of) the partition columns, and there are [test][1] [cases][2] for this use case

[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-30 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15979 Good to merge pending Jenkins. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-29 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15976 @cloud-fan @dongjoon-hyun Thanks for the review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16061: [SPARK-18278] [Scheduler] Support native submission of s...

2016-11-29 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16061 @erikerlandson For the RAT failure, you may either add Apache license header to newly added files or add the file to `dev/.rat-excludes`. --- If your project is set up for it, you can reply

[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-29 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15979 My only concern is that "non-flat" type is neither intuitive nor a well-known term. In fact, this PR only prevents `Option[T <: Product]` to be top-level Dataset types. How about j

[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-29 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15979 LGTM, merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

  1   2   3   4   5   6   7   8   9   10   >