[GitHub] spark issue #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter at RowG...

2016-08-10 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13701 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE

2016-08-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r74132414 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -425,6 +430,111 @@ case class AlterTableDropPartitionCommand

[GitHub] spark pull request #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE

2016-08-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r74132132 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -425,6 +430,111 @@ case class AlterTableDropPartitionCommand

[GitHub] spark issue #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter at RowG...

2016-08-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13701 LGTM, could you fix the conflict (should be trivial)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE

2016-08-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14500 Merging into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE

2016-08-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r74100170 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -425,6 +430,110 @@ case class AlterTableDropPartitionCommand

[GitHub] spark pull request #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE

2016-08-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r74099592 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -425,6 +430,111 @@ case class AlterTableDropPartitionCommand

[GitHub] spark issue #14540: [SPARK-16950] [PySpark] fromOffsets parameter support in...

2016-08-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14540 LGTM, merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE

2016-08-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r74094542 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -425,6 +430,110 @@ case class AlterTableDropPartitionCommand

[GitHub] spark pull request #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE

2016-08-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r74094235 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -425,6 +430,111 @@ case class AlterTableDropPartitionCommand

[GitHub] spark pull request #14548: [SPARK-16958] [SQL] Reuse subqueries within the s...

2016-08-08 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/14548 [SPARK-16958] [SQL] Reuse subqueries within the same query ## What changes were proposed in this pull request? There could be multiple subqueries that generate same results, we could re

[GitHub] spark pull request #14545: [SPARK-11150] [SQL] Dynamic Partition Pruning

2016-08-08 Thread davies
Github user davies closed the pull request at: https://github.com/apache/spark/pull/14545 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...

2016-08-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14469#discussion_r73960965 --- Diff: python/pyspark/sql/session.py --- @@ -384,17 +384,15 @@ def _createFromLocal(self, data, schema): if schema is None or isinstance

[GitHub] spark pull request #14545: [SPARK-11150] [SQL] Dynamic Partition Pruning

2016-08-08 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/14545 [SPARK-11150] [SQL] Dynamic Partition Pruning ## What changes were proposed in this pull request? This PR introduces a new feature for Spark SQL: dynamic partition pruning, which could

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73951262 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/UnsafeArraySuite.scala --- @@ -18,27 +18,131 @@ package

[GitHub] spark pull request #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE

2016-08-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r73945744 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -425,6 +430,110 @@ case class AlterTableDropPartitionCommand

[GitHub] spark pull request #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE

2016-08-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r73945656 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -425,6 +430,110 @@ case class AlterTableDropPartitionCommand

[GitHub] spark pull request #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter ...

2016-08-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r73934292 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -357,10 +358,27 @@ private[sql] class

[GitHub] spark pull request #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter ...

2016-08-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r73934243 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -213,7 +213,9 @@ private[sql] case class FileSourceScanExec

[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...

2016-08-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14469#discussion_r73932092 --- Diff: python/pyspark/sql/session.py --- @@ -432,14 +430,9 @@ def createDataFrame(self, data, schema=None, samplingRatio=None): ``byte

[GitHub] spark issue #14454: [Minor] [ML] Rename TreeEnsembleModels to TreeEnsembleMo...

2016-08-08 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14454 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14266: [SPARK-16526][SQL] Benchmarking Performance for F...

2016-08-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14266#discussion_r73928683 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala --- @@ -1078,6 +1078,146 @@ class AggregateBenchmark

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73926983 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -25,55 +25,57 @@ import

[GitHub] spark issue #14513: [SPARK-16928][SQL] Recursive call of ColumnVector::getIn...

2016-08-05 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14513 It's not necessary, but more clear (consistent) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14513: [SPARK-16928][SQL] Recursive call of ColumnVector::getIn...

2016-08-05 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14513 We should also update VectorizedColumnReader.decodeDictionaryIds() to use the new method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-08-05 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14176 Let's hold on this, if we are going to have single implementation for fast hash map (based on the benchmark result in another PR), do need to merge this fancy implementation choosing. cc @rxin

[GitHub] spark pull request #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter ...

2016-08-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r73756701 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -199,6 +209,19 @@ private[sql] case class

[GitHub] spark pull request #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE

2016-08-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r73753834 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -827,6 +827,45 @@ class DDLSuite extends QueryTest

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73753275 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -25,30 +25,36 @@ import

[GitHub] spark pull request #14500: [SPARK-] SQL DDL: MSCK REPAIR TABLE

2016-08-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r73729864 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -409,6 +409,18 @@ class SparkSqlAstBuilder(conf: SQLConf

[GitHub] spark pull request #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter ...

2016-08-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r73728732 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -368,73 +378,75 @@ class

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73727756 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -25,30 +25,36 @@ import

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73727463 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -25,30 +25,36 @@ import

[GitHub] spark issue #14500: [SPARK-] SQL DDL: MSCK REPAIR TABLE

2016-08-04 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14500 @yhuai Just checked the repair.q, it's kind of useless, already covered by out unit test, we could just ignore it. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #14266: [SPARK-16526][SQL] Benchmarking Performance for F...

2016-08-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14266#discussion_r73616369 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala --- @@ -576,4 +576,605 @@ class AggregateBenchmark

[GitHub] spark pull request #14500: [SPARK-] SQL DDL: MSCK REPAIR TABLE

2016-08-04 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/14500 [SPARK-] SQL DDL: MSCK REPAIR TABLE ## What changes were proposed in this pull request? MSCK REPAIR TABLE could be used to recover the partitions in external catalog based on partitions

[GitHub] spark issue #14500: [SPARK-] SQL DDL: MSCK REPAIR TABLE

2016-08-04 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14500 @yhuai Could you help to generate the golden result for this suite? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73594310 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -25,30 +25,36 @@ import

[GitHub] spark issue #14487: [SPARK-16884] Move DataSourceScanExec out of ExistingRDD...

2016-08-04 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14487 LGTM, merging into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73569234 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -25,30 +25,36 @@ import

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73569132 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -25,30 +25,36 @@ import

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73568470 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -25,30 +25,36 @@ import

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73567644 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -59,21 +65,16 @@ // The 4-bytes header

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73567215 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -25,30 +25,36 @@ import

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73566559 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -25,30 +25,36 @@ import

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73565969 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -25,30 +25,36 @@ import

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73565692 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/UnsafeArraySuite.scala --- @@ -18,27 +18,131 @@ package

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73565092 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/ColumnTypeSuite.scala --- @@ -73,8 +73,8 @@ class ColumnTypeSuite extends

[GitHub] spark issue #14241: [SPARK-16596] [SQL] Refactor DataSourceScanExec to do pa...

2016-08-03 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14241 Merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #12983: [SPARK-15213][PySpark] Unify 'range' usages

2016-08-03 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/12983 If the number of iteration is not huge, it does not matter to use range() or xrange() in Python 2 (especially when you use it together with `for`). That's said I'm not a fan of this change

[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...

2016-08-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r73383148 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -527,4 +538,54 @@ class

[GitHub] spark issue #14241: [SPARK-16596] [SQL] Refactor DataSourceScanExec to do pa...

2016-08-03 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14241 @hvanhovell Have you finished your round of review? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...

2016-08-02 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/14469 [SPARK-16700] [PYSPARK] [SQL] create DataFrame from dict/Row with schema ## What changes were proposed in this pull request? In 2.0, we verify the data type against schema for every row

[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...

2016-08-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r73258703 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -581,62 +586,6 @@ private[sql] object

[GitHub] spark issue #14241: [SPARK-16596] [SQL] Refactor DataSourceScanExec to do pa...

2016-08-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14241 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...

2016-08-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r73247832 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -581,62 +586,6 @@ private[sql] object

[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...

2016-08-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r73247716 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -85,8 +85,15 @@ private[sql] object

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-08-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13701 @gatorsmile In order to merge this patch sooner, it's better to only have related changes to fix the regression. We can clean the dead code later. --- If your project is set up for it, you can

[GitHub] spark pull request #14241: [SPARK-16596] [SQL] Refactor DataSourceScanExec t...

2016-08-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14241#discussion_r73231816 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -275,62 +272,161 @@ private[sql] case class RowDataSourceScanExec

[GitHub] spark pull request #14464: [SPARK-16802] [SQL] fix overflow in LongToUnsafeR...

2016-08-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14464#discussion_r73231143 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -459,8 +459,8 @@ private[execution] final class

[GitHub] spark pull request #14241: [SPARK-16596] [SQL] Refactor DataSourceScanExec t...

2016-08-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14241#discussion_r73228065 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -275,62 +272,161 @@ private[sql] case class RowDataSourceScanExec

[GitHub] spark issue #14465: [SPARK-16320][SPARK-16321] Fixing performance regression...

2016-08-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14465 @maver1ck Thanks for sending out this, I'd prefer to merge #13701, there are already lots of discussions there. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #14241: [SPARK-16596] [SQL] Refactor DataSourceScanExec t...

2016-08-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14241#discussion_r73226713 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -275,62 +272,161 @@ private[sql] case class RowDataSourceScanExec

[GitHub] spark pull request #14464: [SPARK-16802] [SQL] fix overflow in LongToUnsafeR...

2016-08-02 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/14464 [SPARK-16802] [SQL] fix overflow in LongToUnsafeRowMap ## What changes were proposed in this pull request? This patch fix the overflow in LongToUnsafeRowMap when the range of key is very

[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...

2016-08-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r73207167 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -85,8 +85,15 @@ private[sql] object

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-08-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13701 @viirya In order to have a unit test for this (otherwise it will be broken again in future), we could add some counter in vectorwized parquet reader for row groups for test purpose, then use

[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...

2016-08-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r73206857 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -357,6 +357,11 @@ private[sql] class

[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14446 The changes looks good to me. Could you post the numbers of benchmark in PR description ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...

2016-08-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13778 LGTM, merging this into master and 2.0 branch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14442: [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_...

2016-08-01 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14442 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

2016-06-30 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13107 LGTM, Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13977: [SPARK-16301] [SQL] The analyzer rule for resolving usin...

2016-06-29 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13977 LGTM, Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13963: [TRIVIAL][PYSPARK] Clean up orc compression option as we...

2016-06-29 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13963 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13878: [SPARK-16175] [PYSPARK] handle None for UDT

2016-06-28 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13878 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #13948: [SPARK-16259] [PYSPARK] cleanup options in DataFr...

2016-06-28 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13948 [SPARK-16259] [PYSPARK] cleanup options in DataFrame read/write API ## What changes were proposed in this pull request? There are some duplicated code for options in DataFrame reader/writer

[GitHub] spark issue #13948: [SPARK-16259] [PYSPARK] cleanup options in DataFrame rea...

2016-06-28 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13948 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13931: [SPARK-16224] [SQL] [PYSPARK] SparkSession builder's con...

2016-06-28 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13931 LGTM, Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13027: [SPARK-4452][SPARK-11293][Core][BRANCH-1.6] Shuffle data...

2016-06-27 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13027 @tgravescs For minor releases, we (the community) usually do not put much effort on the QA, it's risky to pull in large change (like this one) it. At least, I don't have enough confidence to merge

[GitHub] spark pull request #13107: [SPARK-13850] Force the sorter to Spill when numb...

2016-06-27 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13107#discussion_r68620470 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -72,7 +72,10 @@ private final TaskContext taskContext

[GitHub] spark issue #13900: [SPARK-16173][SQL] Can't join describe() of DataFrame in...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13900 @dongjoon-hyun had also merged into 1.5 branch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13887 Merged into master, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13902: [SPARK-16173] [SQL] Can't join describe() of DataFrame i...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13902 Merged into 1.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13900: [SPARK-16173][SQL] Can't join describe() of DataFrame in...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13900 @dongjoon-hyun Could you send a patch for 1.6? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13887 Reverted, will merge this again once it passed jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13887 Sorry, the jenkins has not finished ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13887 LGTM, merging this into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13887: [SPARK-16186][SQL] Support partition batch prunin...

2016-06-24 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13887#discussion_r68474760 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -79,6 +79,11 @@ private[sql] case class

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13887 Let's go with current patch, I will review it now. Those things could be considered later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13887 @dongjoon-hyun That's a good point, the current patch is better for performance actually --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13887 @dongjoon-hyun Yes, 2) should check the constraints to make it idempotent --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13887 @dongjoon-hyun There is only single predicate in Filter, it could be AND or OR, so it means we could control the order. For this case, I'm not sure the inserted GreaterThanOrEqual/LessThanOrEqual

[GitHub] spark issue #13900: [SPARK-16173][SQL] Can't join describe() of DataFrame in...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13900 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13887 For any IN that have more than one expression, we could add another GreaterThanOrEqual/LessThanOrEqual (not replace the IN). For 2, it's not that obvious yet, we can do that later

[GitHub] spark pull request #13900: [SPARK-16173][SQL] Can't join describe() of DataF...

2016-06-24 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13900#discussion_r68468749 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1908,7 +1908,7 @@ class Dataset[T] private[sql]( // All columns

[GitHub] spark issue #13883: [SPARK-16179] [PYSPARK] fix bugs for Python udf in gener...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13883 @rxin Could you review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13788: [SPARK-16077] [PYSPARK] catch the exception from pickle....

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13788 Merged into 1.6, 2.0 and master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13887 BTW, we could use constraints to implement this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13887 @dongjoon-hyun Thanks for the patch, this optimization sounds reasonable. I'm thinking of is it possible to make the optimization for IN/INSET more general. We could have a optimizer

[GitHub] spark issue #13883: [SPARK-16179] [PYSPARK] fix bugs for Python udf in gener...

2016-06-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13883 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13883: [SPARK-16179] [PYSPARK] fix bugs for Python udf in gener...

2016-06-23 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13883 https://gist.github.com/vlad17/964c0a93510d79cb130c33700f6139b7 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

<    1   2   3   4   5   6   7   8   9   10   >