[GitHub] spark issue #15106: [SPARK-16439] [SQL] bring back the separator in SQL UI

2016-09-19 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15106 @srowen Could you left an `lgtm` here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF

2016-09-16 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15089 @JoshRosen Answer your questions here: 1. This is special designed for Python UDF (we do not have similar requirement in other place), I'd not to make it general without have a real use

[GitHub] spark pull request #15089: [SPARK-15621] [SQL] Support spilling for Python U...

2016-09-16 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15089#discussion_r79253457 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExec.scala --- @@ -17,18 +17,270 @@ package

[GitHub] spark pull request #15089: [SPARK-15621] [SQL] Support spilling for Python U...

2016-09-16 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15089#discussion_r79253378 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExec.scala --- @@ -17,18 +17,270 @@ package

[GitHub] spark issue #15106: [SPARK-16439] [SQL] bring back the separator in SQL UI

2016-09-16 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15106 @srowen The previous problem is caused by using the default locale of host to format the numbers, that sounds perfect by caused some problems. So we fallback to only use English as the locale

[GitHub] spark pull request #15089: [SPARK-15621] [SQL] Support spilling for Python U...

2016-09-15 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15089#discussion_r79038500 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExec.scala --- @@ -17,18 +17,270 @@ package

[GitHub] spark pull request #15089: [SPARK-15621] [SQL] Support spilling for Python U...

2016-09-15 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15089#discussion_r79037083 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExec.scala --- @@ -17,18 +17,270 @@ package

[GitHub] spark pull request #15089: [SPARK-15621] [SQL] Support spilling for Python U...

2016-09-15 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15089#discussion_r79034499 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExec.scala --- @@ -98,7 +353,7 @@ case class BatchEvalPythonExec(udfs

[GitHub] spark issue #15106: [SPARK-16439] [SQL] bring back the separator in SQL UI

2016-09-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15106 cc @maver1ck @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15106: [SPARK-16439] [SQL] bring back the separator in S...

2016-09-14 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/15106 [SPARK-16439] [SQL] bring back the separator in SQL UI ## What changes were proposed in this pull request? Currently, the SQL metrics looks like `number of rows: `, it's very

[GitHub] spark issue #15101: [SPARK-17114][SQL] Fix aggregates grouped by literals wi...

2016-09-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15101 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15103: [SPARK-17100] [SQL] fix Python udf in filter on t...

2016-09-14 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/15103 [SPARK-17100] [SQL] fix Python udf in filter on top of outer join ## What changes were proposed in this pull request? In optimizer, we try to evaluate the condition to see whether it's

[GitHub] spark issue #15026: [SPARK-17472] [PYSPARK] Better error message for seriali...

2016-09-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15026 Merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15076: [SPARK-17114][SQL] Fix aggregates grouped by literals wi...

2016-09-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15076 @hvanhovell Since the optimization rule change the result (or sematics), I'd like to fix it, not to introduce more complexcity in physical layer. The saved few cycles may not worth, because people

[GitHub] spark issue #15026: [SPARK-17472] [PYSPARK] Better error message for seriali...

2016-09-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15026 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15076: [SPARK-17114][SQL] Fix aggregates grouped by literals wi...

2016-09-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15076 @hvanhovell What's the reason that we can not just use the literal use grouping key? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...

2016-09-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14537 What's the progress on this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15068: [SPARK-17514] df.take(1) and df.limit(1).collect() shoul...

2016-09-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15068 Merging this into master and 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15068: [SPARK-17514] df.take(1) and df.limit(1).collect() shoul...

2016-09-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15068 @JoshRosen Just saw the other patch, LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15068: [SPARK-17514] df.take(1) and df.limit(1).collect() shoul...

2016-09-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15068 @JoshRosen The current patch looks good to me, could you also fix the case that LocalLimit is inserted when we turn a DataFrame with limit into Python RDD? --- If your project is set up for it, you

[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF

2016-09-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15089 @JoshRosen The current fix looks good to me. Could you also fix the case that LocalLimit is correctly inserted when we turn a Python DataFrame with limit into a Python RDD? --- If your project

[GitHub] spark pull request #15082: [SPARK-17528][SQL] MutableProjection should not c...

2016-09-13 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15082#discussion_r78667613 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -276,6 +276,11 @@ class CodegenContext

[GitHub] spark issue #15082: [SPARK-17528][SQL] MutableProjection should not cache co...

2016-09-13 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15082 @JoshRosen I think this is a potential bug (not now). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF

2016-09-13 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15089 cc @JoshRosen @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15089: [SPARK-15621] [SQL] Support spilling for Python U...

2016-09-13 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/15089 [SPARK-15621] [SQL] Support spilling for Python UDF ## What changes were proposed in this pull request? When execute a Python UDF, we buffer the input row into as queue, then pull them

[GitHub] spark pull request #10840: [SPARK-12797] [SQL] Generated TungstenAggregate (...

2016-09-12 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10840#discussion_r78495137 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -71,7 +71,9 @@ class SQLMetricsSuite extends

[GitHub] spark issue #15030: [SPARK-17474] [SQL] fix python udf in TakeOrderedAndProj...

2016-09-12 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15030 Merging into 2.0 and master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15030: [SPARK-17474] [SQL] fix python udf in TakeOrdered...

2016-09-12 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15030#discussion_r78435023 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -148,8 +148,8 @@ case class TakeOrderedAndProjectExec

[GitHub] spark issue #15026: [SPARK-17472] [PYSPARK] Better error message for seriali...

2016-09-12 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15026 Once we could log the original stacktrace, this looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #15026: [SPARK-17472] [PYSPARK] Better error message for ...

2016-09-12 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15026#discussion_r78432383 --- Diff: python/pyspark/cloudpickle.py --- @@ -109,6 +109,15 @@ def dump(self, obj): if 'recursion' in e.args[0]: msg

[GitHub] spark pull request #15026: [SPARK-17472] [PYSPARK] Better error message for ...

2016-09-12 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15026#discussion_r78432125 --- Diff: python/pyspark/broadcast.py --- @@ -75,7 +75,13 @@ def __init__(self, sc=None, value=None, pickle_registry=None, path=None): self

[GitHub] spark issue #14919: [SPARK-17354][SQL] Partitioning by dates/timestamps shou...

2016-09-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14919 LGTM, merging into master and 2.0 branch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #15030: [SPARK-17474] [SQL] fix python udf in TakeOrdered...

2016-09-09 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/15030 [SPARK-17474] [SQL] fix python udf in TakeOrderedAndProjectExec ## What changes were proposed in this pull request? When there is any Python UDF in the Project between Sort and Limit

[GitHub] spark issue #14671: [SPARK-17091][SQL] ParquetFilters rewrite IN to OR of Eq

2016-09-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14671 @HyukjinKwon That sounds good, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14671: [SPARK-17091][SQL] ParquetFilters rewrite IN to OR of Eq

2016-09-06 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14671 @andreweduffy Good point, but we still use the parquet-mr when there is any complex type in the schema. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #14671: [SPARK-17091][SQL] ParquetFilters rewrite IN to OR of Eq

2016-09-06 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14671 Before disable the record level filter in parquet reader, I think pushing more non-efficient predicates into parquet reader will be even worse, right? --- If your project is set up for it, you can

[GitHub] spark issue #14944: [SPARK-16334][BACKPORT] Reusing same dictionary column f...

2016-09-06 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14944 Merging this into 2.0 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14927: [SPARK-16922] [SPARK-17211] [SQL] make the address of va...

2016-09-06 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14927 @hvanhovell Has lgtm offline. Other contributors also confirm that this patch fix the bug, I'm going to merge this one into master and 2.0 branch. --- If your project is set up for it, you can

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-09-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r77422471 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TableFileCatalog.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed

[GitHub] spark pull request #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy a...

2016-09-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14517#discussion_r77420949 --- Diff: python/pyspark/sql/readwriter.py --- @@ -692,8 +734,7 @@ def orc(self, path, mode=None, partitionBy=None, compression=None

[GitHub] spark pull request #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy a...

2016-09-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14517#discussion_r77420246 --- Diff: python/pyspark/sql/readwriter.py --- @@ -747,16 +800,25 @@ def _test(): except py4j.protocol.Py4JError: spark = SparkSession

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14866 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14797: [SPARK-17230] [SQL] Should not pass optimized query into...

2016-09-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14797 Merged this into master and 2.0 branch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...

2016-09-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14941 Merging this into master and 2.0 branch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...

2016-09-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14941 @heroldus decodeDictionaryIds() is only used when a batch across pages with different encoding (dictionary or plain), so it's not in the hot pass, I think the performance impact should be fine

[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...

2016-09-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14941 LGTM, pending jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-09-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/12436 @sitalkedia Have a quick look at this one, the use case sounds good, we should improve the stability for long running tasks. Could you explain a bit more how the current patch works? (in the PR

[GitHub] spark pull request #14797: [SPARK-17230] [SQL] Should not pass optimized que...

2016-09-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14797#discussion_r77394975 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -479,13 +480,23 @@ case class DataSource

[GitHub] spark issue #14857: [SPARK-17261][PYSPARK] Using HiveContext after re-creati...

2016-09-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14857 Merging into master and 2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14857: [SPARK-17261][PYSPARK] Using HiveContext after re-creati...

2016-09-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14857 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-09-01 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14176 LGTM, I will merge this one to master (enable us to do more benchmarks with these two implementations). --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #14515: [SPARK-16926] [SQL] Remove partition columns from partit...

2016-09-01 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14515 LGTM, merging this into 2.0 and master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #14927: [SPARK-16922] [SPARK-17211] [SQL] make the addres...

2016-09-01 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/14927 [SPARK-16922] [SPARK-17211] [SQL] make the address of values portable in LongToUnsafeRowMap ## What changes were proposed in this pull request? In LongToUnsafeRowMap, we use offset

[GitHub] spark issue #11956: [SPARK-14098][SQL] Generate Java code that gets a float/...

2016-08-29 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/11956 We could compress them in memory with MEMORY_AND_DISK_SER, this could be controlled by a flag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #14607: [SPARK-17063] [SQL] Improve performance of MSCK REPAIR T...

2016-08-29 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14607 Merging into master and 2.0 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #11956: [SPARK-14098][SQL] Generate Java code that gets a float/...

2016-08-29 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/11956 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #11956: [SPARK-14098][SQL] Generate Java code that gets a float/...

2016-08-29 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/11956 @kiszk The current implementation use ByteBuffer and smart compression algorithms, it too slow to build the in-memory cache, make it useless. So we'd like to improve the performance of building

[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-08-25 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14176 Can we make this `spark.sql.codegen.aggregate.map.twolevel.enable` internal? otherwise we should have a better name. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #14797: [SPARK-17230] [SQL] Should not pass optimized query into...

2016-08-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14797 cc @yhuai @JoshRosen @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14797: [SPARK-17230] [SQL] Should not pass optimized que...

2016-08-24 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/14797 [SPARK-17230] [SQL] Should not pass optimized query into QueryExecution in DataFrameWriter ## What changes were proposed in this pull request? Some analyzer rules have assumptions

[GitHub] spark issue #14722: [SPARK-13286] [SQL] add the next expression of SQLExcept...

2016-08-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14722 @petermaxlee The `Batch` means insert multiple rows into JDBC data source using single query, the context is JDBC data source, not streaming. The SPARK-13286 has more context on that. --- If your

[GitHub] spark issue #14722: [SPARK-13286] [SQL] add the next expression of SQLExcept...

2016-08-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14722 @petermaxlee Before this PR, the underlying cause is NOT included in the logging, the description had been corrected. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request #14607: [SPARK-17063] [SQL] Improve performance of MSCK R...

2016-08-23 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14607#discussion_r75949718 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -518,21 +550,87 @@ case class AlterTableRecoverPartitionsCommand

[GitHub] spark pull request #14607: [SPARK-17063] [SQL] Improve performance of MSCK R...

2016-08-23 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14607#discussion_r75947797 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -518,21 +550,87 @@ case class AlterTableRecoverPartitionsCommand

[GitHub] spark pull request #14607: [SPARK-17063] [SQL] Improve performance of MSCK R...

2016-08-23 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14607#discussion_r75947173 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -443,6 +446,31 @@ case class AlterTableDropPartitionCommand

[GitHub] spark issue #14722: [SPARK-13286] [SQL] add the next expression of SQLExcept...

2016-08-23 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14722 Merging this into master and 2.0 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14692: [SPARK-17115] [SQL] decrease the threshold when s...

2016-08-21 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14692#discussion_r75615959 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -584,15 +584,18 @@ class CodegenContext

[GitHub] spark issue #14692: [SPARK-17115] [SQL] decrease the threshold when split ex...

2016-08-19 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14692 @cloud-fan Could you review this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...

2016-08-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14567#discussion_r75553470 --- Diff: dev/py-validate.sh --- @@ -0,0 +1,110 @@ +#!/usr/bin/env bash --- End diff -- Where is this file used? --- If your project is set

[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...

2016-08-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14567#discussion_r75553298 --- Diff: python/pyspark/cloudpickle.py --- @@ -42,17 +42,17 @@ """ --- End diff -- This file is copied, please ignore i

[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...

2016-08-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14567#discussion_r75553255 --- Diff: python/pyspark/heapq3.py --- @@ -408,10 +408,12 @@ __all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'merge

[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...

2016-08-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14567#discussion_r75553180 --- Diff: python/pyspark/ml/param/shared.py --- @@ -25,7 +25,8 @@ class HasMaxIter(Params): Mixin for param maxIter: max number of iterations (>

[GitHub] spark pull request #14722: [SPARK-13286] [SQL] add the next expression of SQ...

2016-08-19 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/14722 [SPARK-13286] [SQL] add the next expression of SQLException as cause ## What changes were proposed in this pull request? Some JDBC driver (for example PostgreSQL) does not use

[GitHub] spark issue #14693: [SPARK-17113][Shuffle] Job failure due to Executor OOM i...

2016-08-19 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14693 Merging this into master and 2.0 and 1.6 (hopefully), thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #14693: [SPARK-17113][Shuffle] Job failure due to Executo...

2016-08-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14693#discussion_r75529799 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java --- @@ -522,7 +522,7 @@ public long spill() throws

[GitHub] spark issue #14693: [SPARK-17113][Shuffle] Job failure due to Executor OOM i...

2016-08-18 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14693 I'm sorry that did not pay enough attension on this part, thanks to the unit test! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #14693: [SPARK-17113][Shuffle] Job failure due to Executor OOM i...

2016-08-18 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14693 @sitalkedia Your last commit is not correct, which fail this test. `upstream` is the one used for reading, `inMemIterator` is the one used for spilling, see #10142 --- If your project is set up

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-18 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14452 @viirya Had a quick look on this one, It seems that we are trying to make the logical plans looks better, but does not help much for actual physical plan or runtime. For example, for self join

[GitHub] spark issue #14693: [SPARK-17113][Shuffle] Job failure due to Executor OOM i...

2016-08-18 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14693 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14692: decrease the threshold when split expressions

2016-08-17 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/14692 decrease the threshold when split expressions ## What changes were proposed in this pull request? In 2.0, we change the threshold of splitting expressions from 16K to 64K, which cause very

[GitHub] spark issue #14685: [SPARK-17106][SQL] Simplify the SubqueryExpression inter...

2016-08-17 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14685 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14685: [SPARK-17106][SQL] Simplify the SubqueryExpressio...

2016-08-17 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14685#discussion_r75128301 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala --- @@ -56,30 +44,29 @@ trait ExecSubqueryExpression extends

[GitHub] spark issue #14631: [SPARK-17035][SQL][PYSPARK] Improve Timestamp not to los...

2016-08-16 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14631 LGTM, merging this into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #11956: [SPARK-14098][SQL] Generate Java code that gets a float/...

2016-08-15 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/11956 @kiszk I'm sorry that I do not have the bandwidth to review this, https://github.com/apache/spark/pull/13899/files sounds like an easier approach (have not looked into the details), how do you think

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-15 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r74798477 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -25,55 +25,57 @@ import

[GitHub] spark issue #14631: [SPARK-17035][SQL][PYSPARK] Timestamp should preserve mi...

2016-08-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14631 The description is not correct, we keep the microsecond but lost precision in some corner cases, could you update that (also title)? --- If your project is set up for it, you can reply

[GitHub] spark pull request #14631: [SPARK-17035][SQL][PYSPARK] Timestamp should pres...

2016-08-14 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14631#discussion_r74703839 --- Diff: python/pyspark/sql/tests.py --- @@ -183,6 +183,13 @@ def test_empty_row(self): self.assertEqual(len(row), 0) +class

[GitHub] spark issue #14607: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE (follow-up)

2016-08-12 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14607 @yhuai @sameeragarwal @rxin I had updated the MSCK REPAIR TABLE to list all the leaf files in parallel to avoid the listing in Hive metastore, hopefully this could speed up it a lot (not benchmarked

[GitHub] spark issue #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame from dict...

2016-08-12 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14469 @JoshRosen ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14607: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE (follow-...

2016-08-11 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/14607 [SPARK-16905] SQL DDL: MSCK REPAIR TABLE (follow-up) ## What changes were proposed in this pull request? This PR split the the single `createPartitions()` call into smaller batches, which

[GitHub] spark pull request #14500: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE

2016-08-11 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14500#discussion_r74481008 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -425,6 +430,111 @@ case class AlterTableDropPartitionCommand

[GitHub] spark issue #14548: [SPARK-16958] [SQL] Reuse subqueries within the same que...

2016-08-11 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14548 Merging it into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14548: [SPARK-16958] [SQL] Reuse subqueries within the same que...

2016-08-10 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14548 @hvanhovell Had posted an picture, check it out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #14548: [SPARK-16958] [SQL] Reuse subqueries within the s...

2016-08-10 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14548#discussion_r74340389 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala --- @@ -17,42 +17,77 @@ package org.apache.spark.sql.execution

[GitHub] spark pull request #14548: [SPARK-16958] [SQL] Reuse subqueries within the s...

2016-08-10 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14548#discussion_r74340710 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala --- @@ -68,14 +103,90 @@ case class ScalarSubquery

[GitHub] spark pull request #14548: [SPARK-16958] [SQL] Reuse subqueries within the s...

2016-08-10 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14548#discussion_r74340195 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -502,15 +508,64 @@ case class OutputFakerExec(output

[GitHub] spark pull request #14548: [SPARK-16958] [SQL] Reuse subqueries within the s...

2016-08-10 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14548#discussion_r74338195 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala --- @@ -68,14 +103,90 @@ case class ScalarSubquery

[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...

2016-08-10 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14469#discussion_r74337946 --- Diff: python/pyspark/sql/session.py --- @@ -432,14 +430,9 @@ def createDataFrame(self, data, schema=None, samplingRatio=None): ``byte

[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...

2016-08-10 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14469#discussion_r74337920 --- Diff: python/pyspark/sql/tests.py --- @@ -411,6 +411,21 @@ def test_infer_schema_to_local(self): df3 = self.spark.createDataFrame(rdd

[GitHub] spark issue #14513: [SPARK-16928][SQL] Recursive call of ColumnVector::getIn...

2016-08-10 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14513 LGTM, merging into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

<    1   2   3   4   5   6   7   8   9   10   >