[GitHub] spark issue #13872: [SPARK-16164][SQL] Update `CombineFilters` to try to con...

2016-06-23 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13872 My major concern is that even the reported case is "fixed" by this PR, in general user applications shouldn't rely on predicate evaluation order. Other than that, the

[GitHub] spark issue #13872: [SPARK-16164][SQL] Filter pushdown should keep the order...

2016-06-23 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13872 @dongjoon-hyun Thanks for the work! However, I think the optimizer should have the freedom to reorder predicate evaluation order. For example, we may evaluate cheap predicates first in order to

[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-22 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/13865 [SPARK-13709][SQL] Initialize deserializer with both table and partition properties when reading partitioned tables ## What changes were proposed in this pull request? When reading

[GitHub] spark pull request #13864: [SQL][MINOR] Fix minor formatting issues in SHOW ...

2016-06-22 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/13864 [SQL][MINOR] Fix minor formatting issues in SHOW CREATE TABLE output ## What changes were proposed in this pull request? This PR fixes two minor formatting issues appearing in `SHOW

[GitHub] spark issue #13807: [SPARK-16097][SQL] Encoders.tuple should handle null obj...

2016-06-22 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13807 Merging to master and branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13830: [SPARK-16121] ListingFileCatalog does not list in parall...

2016-06-22 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13830 LGTM, merging to master and branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13827: [SQL][DOC] SQL programming guide add deprecated methods ...

2016-06-21 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13827 LGTM. Thanks! Merged to master and branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #13807: [SPARK-16097][SQL] Encoders.tuple should handle null obj...

2016-06-21 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13807 LGTM except for minor issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13807: [SPARK-16097][SQL] Encoders.tuple should handle n...

2016-06-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13807#discussion_r67880034 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -830,6 +830,13 @@ class DatasetSuite extends QueryTest with

[GitHub] spark pull request #13807: [SPARK-16097][SQL] Encoders.tuple should handle n...

2016-06-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13807#discussion_r67879774 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -110,16 +110,25 @@ object ExpressionEncoder

[GitHub] spark pull request #13810: [SPARK-16037][SQL] Follow-up: add DataFrameWriter...

2016-06-21 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/13810 [SPARK-16037][SQL] Follow-up: add DataFrameWriter.insertInto() test cases for by position resolution ## What changes were proposed in this pull request? This PR migrates some test cases

[GitHub] spark issue #13753: [SPARK-16029][SPARKR] SparkR add dropTempView and deprec...

2016-06-21 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13753 @shivaram I think we've covered all of them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #13751: [SPARK-15159][SPARKR] SparkSession roxygen2 doc, ...

2016-06-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13751#discussion_r67819858 --- Diff: docs/sparkr.md --- @@ -158,20 +152,19 @@ write.df(people, path="people.parquet", source="parquet", mode="overwrite&q

[GitHub] spark issue #13797: [SPARK-15894][SQL][DOC] Update docs for controlling #par...

2016-06-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13797 LGTM, merged to master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #13799: [SPARK-15863][SQL][DOC][SPARKR] sql programming guide up...

2016-06-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13799 LGTM. Thanks! Merging to master and branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #13789: [SPARK-16074] [MLLIB] expose VectorUDT/MatrixUDT in a pu...

2016-06-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13789 @rxin Thanks, failed to merge it due to network issue... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #13789: [SPARK-16074] [MLLIB] expose VectorUDT/MatrixUDT in a pu...

2016-06-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13789 Merging to master and branch-2.0 on behalf of @mengxr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #13789: [SPARK-16074] [MLLIB] expose VectorUDT/MatrixUDT in a pu...

2016-06-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13789 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #13592: [SPARK-15863][SQL][DOC] Initial SQL programming guide up...

2016-06-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13592 @maropu Sorry for the late reply. Yea, adding description to these two options makes sense. Would you like to open a PR for this? Thanks! --- If your project is set up for it, you can reply to

[GitHub] spark issue #13592: [SPARK-15863][SQL][DOC] Initial SQL programming guide up...

2016-06-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13592 @felixcheung Thanks for the review and your work on PR #13751! Was traveling during the weekend. Let's address these comments in follow-up PRs. --- If your project is set up for it, yo

[GitHub] spark issue #13772: [SPARK-16049][SQL] Make InsertIntoTable's expectedColumn...

2016-06-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13772 Yea, but partitioning is only available for `HadoopFsRelation`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of P...

2016-06-20 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13778#discussion_r67719911 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala --- @@ -220,9 +220,15 @@ object RowEncoder

[GitHub] spark pull request #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of P...

2016-06-20 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13778#discussion_r67698506 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala --- @@ -220,9 +220,15 @@ object RowEncoder

[GitHub] spark issue #13772: [SPARK-16049][SQL] Make InsertIntoTable's expectedColumn...

2016-06-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13772 Oh, #13769 only solves case-insensitive resolution for partition columns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #13772: [SPARK-16049][SQL] Make InsertIntoTable's expectedColumn...

2016-06-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13772 @viirya Could you please help verify that whether #13769 already fixed this issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #13756: [SPARK-16041][SQL] Disallow Duplicate Columns in partiti...

2016-06-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13756 I think it would be better to move these checks to the analyzer, so that the SQL equivalents of those structures (partitioning and bucketing) can also benefit from them. --- If your project is

[GitHub] spark issue #13756: [SPARK-16041][SQL] Disallow Duplicate Columns in partiti...

2016-06-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13756 Do you mean `bucketBy` instead of `blockBy` in the PR title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13769: [SPARK-16030] [SQL] Allow specifying static partitions w...

2016-06-20 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13769 LGTM except for minor issues. I'm merging this to master and branch-2.0 first since this is an critical issue. We can address the comments later. --- If your project is set up for it, yo

[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...

2016-06-20 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13769#discussion_r67643557 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/DataSourceAnalysisSuite.scala --- @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...

2016-06-19 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13769#discussion_r67637303 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -43,8 +43,128 @@ import

[GitHub] spark pull request #13769: [SPARK-16030] [SQL] Allow specifying static parti...

2016-06-19 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13769#discussion_r67637019 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -43,8 +43,128 @@ import

[GitHub] spark issue #13753: [SPARK-16029][SPARKR] SparkR add dropTempView and deprec...

2016-06-17 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13753 LGTM pending Jenkins. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13746: [SPARK-16030] [SQL] Allow specifying static parti...

2016-06-17 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13746#discussion_r67592333 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -320,6 +320,19 @@ trait CheckAnalysis extends

[GitHub] spark issue #13747: [SPARK-16033][SQL] insertInto() can't be used together w...

2016-06-17 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13747 cc @cloud-fan @clockfly @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13747: [SPARK-16033][SQL] insertInto() can't be used tog...

2016-06-17 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/13747 [SPARK-16033][SQL] insertInto() can't be used together with partitionBy() ## What changes were proposed in this pull request? When inserting into an existing partitioned

[GitHub] spark issue #13743: [SPARK-15916][SQL] JDBC filter push down should respect ...

2016-06-17 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13743 Merging to master and branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13743: [SPARK-15916][SQL] JDBC filter push down should respect ...

2016-06-17 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13743 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #13635: [SPARK-15159][SPARKR] SparkR SparkSession API

2016-06-17 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13635#discussion_r67584118 --- Diff: R/pkg/R/SQLContext.R --- @@ -615,11 +619,12 @@ clearCache <- function() { #' @method dropTempTable default dropTempTable

[GitHub] spark issue #13732: [SPARK-16014][SQL] Rename optimizer rules to be more con...

2016-06-17 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13732 LGTM Renaming `DistinctAggregateRewriter` sounds reasonable, but it is an analysis rule rather than an optimization rule. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #13592: [SPARK-15863][SQL][DOC] Initial SQL programming guide up...

2016-06-17 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13592 Thanks everyone for the review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-17 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r67543099 --- Diff: docs/sql-programming-guide.md --- @@ -517,24 +517,26 @@ types such as Sequences or Arrays. This RDD can be implicitly converted to a Dat

[GitHub] spark issue #13722: [SPARK-15925][SPARKR] R DataFrame add back registerTempT...

2016-06-16 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13722 LGTM, but I'd like to let @shivaram to sign off. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project doe

[GitHub] spark pull request #13714: [SPARK-15996][R] Fix R examples by removing depre...

2016-06-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13714#discussion_r67446548 --- Diff: examples/src/main/r/data-manipulation.R --- @@ -75,8 +75,8 @@ destDF <- select(flightsDF, "dest", "cancelled")

[GitHub] spark pull request #13714: [SPARK-15996][R] Fix R examples by removing depre...

2016-06-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13714#discussion_r67439698 --- Diff: examples/src/main/r/data-manipulation.R --- @@ -75,8 +75,8 @@ destDF <- select(flightsDF, "dest", "cancelled")

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-16 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13371 @viirya One problem in your new benchmark code is that `1 << 50` is actually very small since it's an `Int`: ``` scala> 1 << 50 res0: Int = 262144 ```

[GitHub] spark issue #13710: [SQL] Minor HashAggregateExec string output fixes

2016-06-16 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13710 cc @clockfly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark pull request #13710: [SQL] Minor HashAggregateExec string output fixes

2016-06-16 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/13710 [SQL] Minor HashAggregateExec string output fixes ## What changes were proposed in this pull request? This PR fixes some minor `.toString` format issues for `HashAggregateExec

[GitHub] spark issue #13572: [SPARK-15862] [SQL] Better Error Message When Having Dat...

2016-06-16 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13572 Thanks, merging to master and branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #13698: [SQL] Removes FileFormat.prepareRead

2016-06-15 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/13698 [SQL] Removes FileFormat.prepareRead ## What changes were proposed in this pull request? Interface method `FileFormat.prepareRead()` was added in #12088 to handle a special case in the

[GitHub] spark issue #13698: [SQL] Removes FileFormat.prepareRead

2016-06-15 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13698 JIRA is down. Will create a ticket for it later. cc @rxin @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #13696: [SQL] Rename various Parquet support classes.

2016-06-15 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13696 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #13622: [SPARK-15901] [SQL] [TEST] Verification of CONVER...

2016-06-15 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13622#discussion_r67249172 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/parquetSuites.scala --- @@ -676,6 +676,46 @@ class ParquetSourceSuite extends

[GitHub] spark issue #13622: [SPARK-15901] [SQL] [TEST] Verification of CONVERT_METAS...

2016-06-15 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13622 LGTM, merging to master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-15 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13585 @lianhuiwang @chenghao-intel Thanks for working on this! As you already know, we are currently trying to get Spark 2.0 RC1 ASAP, please allow me to revisit both of your branches later. Sorry for

[GitHub] spark pull request #13572: [SPARK-15862] [SQL] Better Error Message When Hav...

2016-06-15 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13572#discussion_r67224128 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala --- @@ -17,30 +17,30 @@ package

[GitHub] spark issue #13636: [SPARK-15637][SPARK-15931][SPARKR] Fix R masked function...

2016-06-15 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13636 Works for me too, passes tests on R 3.3.0 on OS X 10.10. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #13623: [SPARK-15895][SQL] Filters out metadata files whi...

2016-06-14 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13623#discussion_r67014620 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -83,8 +83,9 @@ class ListingFileCatalog

[GitHub] spark pull request #13623: [SPARK-15895][SQL] Filters out metadata files whi...

2016-06-14 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13623#discussion_r67011918 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileCatalog.scala --- @@ -197,4 +201,9 @@ abstract class

[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...

2016-06-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13649 Merging to master and branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13636: [SPARK-15637][SPARKR] Remove R version check since maske...

2016-06-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13636 @shivaram True. @felixcheung Could you please also add SPARK-15931 to the PR title if this PR also targets that one? Thanks. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66886012 --- Diff: docs/sql-programming-guide.md --- @@ -587,7 +590,7 @@ for the JavaBean. {% highlight java %} // sc is an existing

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66884213 --- Diff: docs/sql-programming-guide.md --- @@ -12,130 +12,130 @@ title: Spark SQL and DataFrames Spark SQL is a Spark module for structured data

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66884198 --- Diff: docs/sql-programming-guide.md --- @@ -12,130 +12,130 @@ title: Spark SQL and DataFrames Spark SQL is a Spark module for structured data

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66884076 --- Diff: docs/sql-programming-guide.md --- @@ -517,24 +517,26 @@ types such as Sequences or Arrays. This RDD can be implicitly converted to a Dat

[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...

2016-06-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13649 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #13651: [SPARK-15776][SQL] Divide Expression inside Aggre...

2016-06-13 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13651#discussion_r66881454 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -213,7 +213,7 @@ case class Multiply(left

[GitHub] spark pull request #13651: [SPARK-15776][SQL] Divide Expression inside Aggre...

2016-06-13 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13651#discussion_r66881195 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -213,7 +213,7 @@ case class Multiply(left

[GitHub] spark issue #13623: [SPARK-15895][SQL] Filters out metadata files while doin...

2016-06-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13623 @rxin Thanks. Consolidated all the underscore- and dot-files filtering logic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #13648: [SQL][DOC][minor] document the contract of encoder seria...

2016-06-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13648 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #13137: [SPARK-15247][SQL] Set the default number of partitions ...

2016-06-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13137 @maropu Just had an offline discussion with @yhuai. So this case is a little bit different from #13444. In #13444, the number of leaf files is unknown before issuing the job, and each task may

[GitHub] spark issue #13644: [SPARK-15925][SQL][SPARKR] Replaces registerTempTable wi...

2016-06-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13644 @shivaram Thanks for the review. I'm updating the SQL programming guide in #13592, will document it. (Actually that's why I found that this API hadn't been update.) --- If your

[GitHub] spark pull request #13644: [SPARK-15925][SQL][SPARKR] Replaces registerTempT...

2016-06-13 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/13644 [SPARK-15925][SQL][SPARKR] Replaces registerTempTable with createOrReplaceTempView ## What changes were proposed in this pull request? This PR replaces `registerTempTable` with

[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-13 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13585 One problem in the tests is that other optimization rules may optimize the filter predicates before the newly added rule, and hide bugs in the new rule. The one @clockfly pointed out is one

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-13 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66831261 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -92,6 +92,36 @@ object PhysicalOperation extends

[GitHub] spark pull request #13623: [SPARK-15895][SQL] Filters out metadata files whi...

2016-06-12 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13623#discussion_r66725019 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileCatalog.scala --- @@ -96,7 +96,10 @@ abstract class

[GitHub] spark issue #13623: [SPARK-15895][SQL] Filters out metadata files while doin...

2016-06-12 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13623 @yhuai Unlike what we discussed before, we can't filter out metadata files [here][1]. Otherwise, Parquet read path may also skip all Parquet summary files, which isn't expecte

[GitHub] spark pull request #13623: [SPARK-15895][SQL] Filters out metadata files whi...

2016-06-12 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/13623 [SPARK-15895][SQL] Filters out metadata files while doing partition discovery ## What changes were proposed in this pull request? Take the following directory layout as an example

[GitHub] spark issue #13546: [SPARK-15808] [SQL] File Format Checking When Appending ...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13546 @cloud-fan @gatorsmile Yea, seems that appending with a different format doesn't make any sense. @yhuai any ideas? Is there any situations that we may want to append using a different f

[GitHub] spark issue #13137: [SPARK-15247][SQL] Set the default number of partitions ...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13137 Sorry for the late reply. 1 seems to be a pretty random number, is default parallelism a better choice? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66700277 --- Diff: docs/sql-programming-guide.md --- @@ -1607,13 +1600,13 @@ a regular multi-line JSON file will most often fail. {% highlight r

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13371 Reverted from master and branch-2.0. @viirya For the benchmark, there are two things: 1. The benchmark also counts Parquet file writing into it, so the real number should be much

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13544 Ah, too bad... I wasn't aware of this PR when I was doing #13592. Will review this one to see whether I missed something in #13592. Thanks for working on this! --- If your project is set u

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13371 @yhuai We used to support row group level filter push-down before refactoring `HadoopFsRelation` into `FileFormat`, but lost it (by accident I guess) after the refactoring. So now we only have

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/12836 The SQL part of changes look generally good except for a few styling issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66697390 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala --- @@ -325,6 +330,71 @@ case class MapGroupsExec

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66697263 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -243,6 +243,55 @@ case class MapGroups

[GitHub] spark issue #13610: [SPARK-15884][SparkR][SQL] Overriding stringArgs in MapP...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13610 Merging to master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66691737 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -243,6 +243,55 @@ case class MapGroups

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66690948 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -243,6 +243,55 @@ case class MapGroups

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66688912 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -381,6 +385,50 @@ class RelationalGroupedDataset protected

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66688844 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -381,6 +385,50 @@ class RelationalGroupedDataset protected

[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13585 You probably meant "conjunction" (aka "logical and") instead of "disjunction" (aka "logical or") in the PR title and comments. As @clockfly had

[GitHub] spark issue #13610: Overriding stringArgs in MapPartitionsInR

2016-06-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13610 LGTM pending Jenkins, and the PR title change. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #13572: [SPARK-15862] [SQL] Better Error Message When Having Dat...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13572 Mostly LGTM except for one minor comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #13572: [SPARK-15862] [SQL] Better Error Message When Hav...

2016-06-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13572#discussion_r66684973 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala --- @@ -17,30 +17,30 @@ package

[GitHub] spark issue #13593: [SPARK-15864] [SQL] Fix Inconsistent Behaviors when Unca...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13593 We do have a private method `tryUncacheQuery` in `CacheManager`, which exposes exactly the same semantics as `UNCACHE TABLE ... IF EXISTS`. But `Catalog` doesn't have an equivalent public m

[GitHub] spark issue #13593: [SPARK-15864] [SQL] Fix Inconsistent Behaviors when Unca...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13593 From the perspective of better consistency, I'd prefer the current fix in the PR plus a new `UNCACHE TABLE ... IF EXISTS` syntax. However, this PR does introduce behavior breaking c

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r3884 --- Diff: docs/sql-programming-guide.md --- @@ -12,130 +12,121 @@ title: Spark SQL and DataFrames Spark SQL is a Spark module for structured data

[GitHub] spark issue #13605: [SPARK-15856][SQL] Revert API breaking changes made in S...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13605 Do we use `SQLContext.range` in any test/example code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-10 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r3638 --- Diff: docs/sql-programming-guide.md --- @@ -184,20 +175,20 @@ showDF(df) -## DataFrame Operations +## Untyped Dataset

<    1   2   3   4   5   6   7   8   9   10   >