[GitHub] spark pull request #13883: [SPARK-16179] [PYSPARK] fix bugs for Python udf i...

2016-06-23 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13883 [SPARK-16179] [PYSPARK] fix bugs for Python udf in generate ## What changes were proposed in this pull request? This PR fix the bug when Python UDF is used in explode (generator

[GitHub] spark issue #13878: [SPARK-16175] [PYSPARK] handle None for UDT

2016-06-23 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13878 cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13788: [SPARK-16077] [PYSPARK] catch the exception from ...

2016-06-23 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13788#discussion_r68324868 --- Diff: python/pyspark/cloudpickle.py --- @@ -169,7 +169,10 @@ def save_function(self, obj, name=None): if name is None

[GitHub] spark issue #13788: [SPARK-16077] [PYSPARK] catch the exception from pickle....

2016-06-23 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13788 cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13878: [SPARK-16175] [PYSPARK] handle None for UDT

2016-06-23 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13878 [SPARK-16175] [PYSPARK] handle None for UDT ## What changes were proposed in this pull request? Scala UDT will bypass all the null and will not pass them into serialize() and deserialize

[GitHub] spark issue #13870: [SPARK-16165][SQL] Fix the update logic for InMemoryTabl...

2016-06-23 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13870 @dongjoon-hyun Sorry, I misunderstood it, I thought it was batchStats. The changes look good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #13871: [SPARK-16163] [SQL] Cache the statistics for logical pla...

2016-06-23 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13871 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13870: [SPARK-16165][SQL] Fix the update logic for InMemoryTabl...

2016-06-23 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13870 @dongjoon-hyun I think that is also used to have better estimation on the dataset, could be used by planner to have better physical plan. --- If your project is set up for it, you can reply

[GitHub] spark pull request #13871: [SPARK-16163] [SQL] Cache the statistics for logi...

2016-06-23 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13871 [SPARK-16163] [SQL] Cache the statistics for logical plans ## What changes were proposed in this pull request? This calculation of statistics is not trivial anymore, it could be very slow

[GitHub] spark issue #13871: [SPARK-16163] [SQL] Cache the statistics for logical pla...

2016-06-23 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13871 cc @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13814: [SPARK-16003] SerializationDebugger runs into infinite l...

2016-06-22 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13814 Looks good, merging this into master and 2.0 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13809: [SPARK-16104][SQL] Do not creaate CSV writer object for ...

2016-06-21 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13809 LGTM, Merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13814: [SPARK-16003] SerializationDebugger runs into infinite l...

2016-06-21 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13814 What's the error message looks like in the case of unserializable object (for example, Iterator in scala 2.10)? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #13812: [SPARK-16086] [SQL] [PYSPARK] create Row without any fie...

2016-06-21 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13812 Merging into 2.0 and master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13812: [SPARK-16086] [SQL] create Row without any fields

2016-06-21 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13812 [SPARK-16086] [SQL] create Row without any fields ## What changes were proposed in this pull request? This PR allows us to create a Row without any fields. ## How was this patch

[GitHub] spark pull request #13800: [SPARK-13792][SQL] Addendum: Fix Python API

2016-06-21 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13800#discussion_r67905999 --- Diff: python/pyspark/sql/readwriter.py --- @@ -77,7 +77,7 @@ def _set_json_opts(self, schema, primitivesAsString, prefersDecimal, def

[GitHub] spark issue #13793: [SPARK-16086] [SQL] fix Python UDF without arguments (fo...

2016-06-21 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13793 @mengxr My bad, 2.0 and master does not have this bug, but still good to have the test and change, I will send a new PR to fix the conflict. --- If your project is set up for it, you can reply

[GitHub] spark issue #13793: [SPARK-16086] [SQL] fix Python UDF without arguments (fo...

2016-06-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13793 Merged into 1.5, 1.6, 2.0 and master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #13793: [SPARK-16086] [SQL] fix Python UDF without argume...

2016-06-20 Thread davies
Github user davies closed the pull request at: https://github.com/apache/spark/pull/13793 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...

2016-06-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13652 @ckadner The test function here try to similar this SQL function: cast(from_unix_timestamp(cast("2016-03-13 01:59:59" as timestamp), "PST") as string) In t

[GitHub] spark pull request #13793: [SPARK-16086] [SQL] fix Python UDF without argume...

2016-06-20 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13793 [SPARK-16086] [SQL] fix Python UDF without arguments (for 1.6) ## What changes were proposed in this pull request? Fix the bug for Python UDF that does not have any arguments

[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...

2016-06-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13652 @ckadner That could fix the flaky test, but hide the actual bug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

2016-06-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13784 cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13788: [SPARK-16077] [PYSPARK] catch the exception from ...

2016-06-20 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13788 [SPARK-16077] [PYSPARK] catch the exception from pickle.whichmodule() ## What changes were proposed in this pull request? In the case that we don't know which module a object came from

[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...

2016-06-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13652 @lw-lin @robbinspg Thanks for report this, sent https://github.com/apache/spark/pull/13784 to fix the bug. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #13783: [HOTFIX][SPARK-15613][SQL]Set test runtime timezone for ...

2016-06-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13783 @adrian-wang Thanks for the PR, but there is actually a bug in toUTCTime/fromUTCTime, this PR will hide that. I created https://github.com/apache/spark/pull/13784 to fix the actual bug

[GitHub] spark pull request #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_tim...

2016-06-20 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13784 [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp should not depends on local timezone ## What changes were proposed in this pull request? Currently, we use local timezone to parse

[GitHub] spark pull request #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of P...

2016-06-20 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13778#discussion_r67710677 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -427,8 +427,12 @@ case class MapObjects private

[GitHub] spark pull request #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of P...

2016-06-20 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13778#discussion_r67710501 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -427,8 +427,12 @@ case class MapObjects private

[GitHub] spark issue #13768: [SPARK-16053][R] Add `spark_partition_id` in SparkR

2016-06-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13768 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...

2016-06-19 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13652 Merged into 1.6 and 2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...

2016-06-18 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13652 Since we have to cut a RC for 2.0, I will merge this into master and 2.0 after it pass tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...

2016-06-18 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13652 @ckadner Had updated the PR to fallback to java.sql.Timestamp to do the reverse lookup, that should have better resolution (especially when converting a timestamp to another timezone

[GitHub] spark issue #13541: [SPARK-15803][PYSPARK] Support with statement syntax for...

2016-06-17 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13541 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13541: [SPARK-15803][PYSPARK] Support with statement syntax for...

2016-06-17 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13541 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case I...

2016-06-17 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13728#discussion_r67464686 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -545,4 +545,28 @@ class

[GitHub] spark issue #13723: [SPARK-15822][SQL] Prevent byte array backed classes fro...

2016-06-16 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13723 LGTM, Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...

2016-06-16 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13652 @JoshRosen How does this look now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13717: [SPARK-15811] [SQL] fix the Python UDF in Scala 2.10

2016-06-16 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13717 @damnMeddlingKid This feature is covered well by unit tests (PR builder ran with Scala 2.11 and Hadoop 2.x), but we did not have a jenkins build to ran with Scala 2.0, will have one to run again

[GitHub] spark issue #13717: [SPARK-15811] [SQL] fix the Python UDF in Scala 2.10

2016-06-16 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13717 @rxin Another thing I found is that the SerializationDebugger ran into a infinite loop (or very very slow) before the fix, I have to disable it to release that an Iterator can't be serialized

[GitHub] spark issue #13717: [SPARK-15811] [SQL] fix the Python UDF in Scala 2.10

2016-06-16 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13717 cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13717: [SPARK-15811] [SQL] fix the Python UDF in Scala 2...

2016-06-16 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13717 [SPARK-15811] [SQL] fix the Python UDF in Scala 2.10 ## What changes were proposed in this pull request? Iterator can't be serialized in Scala 2.10, we should force it into a array to make

[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...

2016-06-16 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13652 @RussellSpitzer Unfortunately, most of the database do not support that (cast a integer into date), same here. For parquet, we use the days since epoch rather than using java.sql.Date. --- If your

[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...

2016-06-16 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13652 @JoshRosen If any human time representation is involved, for example, `2016-06-15 17:22:07.0`, a timezone could be used. In your SQL query, you are comparing absolute timestamp again human time

[GitHub] spark issue #13618: [SPARK-15796] [CORE] Reduce spark.memory.fraction defaul...

2016-06-16 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13618 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13693: [SPARK-15782][hotfix] Fix compilation with Scala 2.10.

2016-06-15 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13693 @nezihyigitbasi I reverted that commit both in master and 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13667: [SPARK-15934] [SQL] Return binary mode in ThriftServer

2016-06-15 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13667 LGTM, Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13682: [SPARK-15888] [SQL] fix Python UDF with aggregate

2016-06-15 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13682 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13682: [SPARK-15888] [SQL] fix Python UDF with aggregate

2016-06-15 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13682 cc @gatorsmile @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13682: [SPARK-15888] [SQL] fix Python UDF with aggregate

2016-06-15 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13682#discussion_r67109644 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExec.scala --- @@ -46,6 +46,8 @@ case class BatchEvalPythonExec(udfs

[GitHub] spark pull request #13682: [SPARK-15888] [SQL] fix Python UDF with aggregate

2016-06-15 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13682 [SPARK-15888] [SQL] fix Python UDF with aggregate ## What changes were proposed in this pull request? After we move the ExtractPythonUDF rule into physical plan, Python UDF can't work

[GitHub] spark issue #13663: [SPARK-15950][SQL] Eliminate unreachable code at project...

2016-06-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13663 +1 for @cloud-fan said. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #13652: [SPARK-] Fix incorrect days to millis conversion

2016-06-14 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13652#discussion_r67044111 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DateType.scala --- @@ -30,7 +30,7 @@ import

[GitHub] spark issue #13652: [SPARK-] Fix incorrect days to millis conversion

2016-06-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13652 @RussellSpitzer Unfortunately TimeZone does not provide enough API to access the transitions, we have to probe them as a blackbox. I had pulled in the unit tests from @JoshRosen 's branch, make sure

[GitHub] spark issue #13539: [SPARK-15795] [SQL] Enable more optimizations in whole s...

2016-06-13 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13539 As @cloud-fan said, the implementation is hacky, the improvements is not not obvious (I believe JIT compile do these very well, correct me if I'm wrong), I'd like not do this. There are millions

[GitHub] spark pull request #13652: [SPARK-] Fix incorrect days to millis conversion

2016-06-13 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13652#discussion_r66883860 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala --- @@ -503,5 +510,25 @@ class DateTimeUtilsSuite extends

[GitHub] spark issue #13652: [SPARK-] Fix incorrect days to millis conversion

2016-06-13 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13652 cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13652: [SPARK-] Fix incorrect days to millis conversion

2016-06-13 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13652 [SPARK-] Fix incorrect days to millis conversion ## What changes were proposed in this pull request? Internally, we use Int to represent a date (the days since 1970-01-01), when we

[GitHub] spark issue #13501: [SPARK-15759] [SQL] Fallback to non-codegen when fail to...

2016-06-10 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13501 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13566: [SPARK-15678] Add support to REFRESH data source paths

2016-06-10 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13566 LGTM, Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13161: [SPARK-14851] [Core] Support radix sort with nullable lo...

2016-06-10 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13161 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13161: [SPARK-14851] [Core] Support radix sort with null...

2016-06-10 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13161#discussion_r66688459 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala --- @@ -64,49 +64,57 @@ case class SortOrder(child

[GitHub] spark issue #13531: [SPARK-15654] [SQL] fix non-splitable files for text bas...

2016-06-10 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13531 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13589: [SPARK-15825][SQL] Fix SMJ invalid results

2016-06-10 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13589 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13589: [SPARK-15825][SQL] Fix SMJ invalid results

2016-06-10 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13589 This line ``` InternalRow i = null; ``` requires that INPUT_ROW should be `i` (could be changed to a refresh name). The fix looks reasonable to me. --- If your project is set up

[GitHub] spark issue #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScalarSubque...

2016-06-10 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13155 @hvanhovell Could you have a final look on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScalarSubque...

2016-06-10 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13155 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13569: [SPARK-15791] Fix NPE in ScalarSubquery

2016-06-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13569#discussion_r66561018 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala --- @@ -54,6 +54,10 @@ class SubquerySuite extends QueryTest

[GitHub] spark pull request #13566: [SPARK-15678] Add support to REFRESH data source ...

2016-06-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13566#discussion_r66541600 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala --- @@ -226,4 +226,11 @@ abstract class Catalog { */ def

[GitHub] spark pull request #13566: [SPARK-15678] Add support to REFRESH data source ...

2016-06-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13566#discussion_r66541070 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -157,4 +161,49 @@ private[sql] class CacheManager extends Logging

[GitHub] spark pull request #13566: [SPARK-15678] Add support to REFRESH data source ...

2016-06-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13566#discussion_r66541029 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -157,4 +161,49 @@ private[sql] class CacheManager extends Logging

[GitHub] spark pull request #13566: [SPARK-15678] Add support to REFRESH data source ...

2016-06-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13566#discussion_r66540666 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -157,4 +161,49 @@ private[sql] class CacheManager extends Logging

[GitHub] spark pull request #13566: [SPARK-15678] Add support to REFRESH data source ...

2016-06-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13566#discussion_r66540478 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -157,4 +161,49 @@ private[sql] class CacheManager extends Logging

[GitHub] spark pull request #13566: [SPARK-15678] Add support to REFRESH data source ...

2016-06-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13566#discussion_r66540134 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala --- @@ -226,4 +226,11 @@ abstract class Catalog { */ def

[GitHub] spark pull request #13531: [SPARK-15654] [SQL] fix non-splitable files for t...

2016-06-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13531#discussion_r66539899 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala --- @@ -340,6 +340,40 @@ class

[GitHub] spark issue #13569: [SPARK-15791] Fix NPE in ScalarSubquery

2016-06-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13569 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13531: [SPARK-15654] [SQL] fix non-splitable files for t...

2016-06-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13531#discussion_r66478617 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -298,6 +309,28 @@ trait FileFormat

[GitHub] spark issue #13189: [SPARK-14670][SQL] allow updating driver side sql metric...

2016-06-08 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13189 LGTM, merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #13531: [SPARK-15654] [SQL] fix non-splitable files for t...

2016-06-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13531#discussion_r66383788 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -298,6 +309,28 @@ trait FileFormat

[GitHub] spark issue #10706: [SPARK-12543] [SPARK-4226] [SQL] Subquery in expression

2016-06-08 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/10706 predicate subquery (IN, EXISTS) in SELECT is not supported in 2.0, only supported in WHERE/HAVING. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #13189: [SPARK-14670][SQL] allow updating driver side sql...

2016-06-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13189#discussion_r65990691 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -361,17 +377,22 @@ private[spark] class SQLHistoryListener(conf

[GitHub] spark pull request #13189: [SPARK-14670][SQL][WIP] allow updating driver sid...

2016-06-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13189#discussion_r65990142 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -361,17 +377,22 @@ private[spark] class SQLHistoryListener(conf

[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-06-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13517#discussion_r65975446 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -435,6 +434,37 @@ class SparkSqlAstBuilder(conf: SQLConf

[GitHub] spark pull request #13531: [SPARK-15654] [SQL] fix non-splitable files for t...

2016-06-06 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13531 [SPARK-15654] [SQL] fix non-splitable files for text based file formats ## What changes were proposed in this pull request? This PR is based on #13442 , fix the bug for non-splittable files

[GitHub] spark issue #13531: [SPARK-15654] [SQL] fix non-splitable files for text bas...

2016-06-06 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13531 cc @marmbrus @rxin @clockfly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #10572: SPARK-12619 Combine small files in a hadoop directory in...

2016-06-06 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/10572 This is fixed in 2.0, could you close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #13442: [SPARK-15654][SQL] Check if all the input files a...

2016-06-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13442#discussion_r65937515 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -143,8 +143,18 @@ private[sql] object

[GitHub] spark pull request #13442: [SPARK-15654][SQL] Check if all the input files a...

2016-06-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13442#discussion_r65936126 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -215,6 +216,13 @@ trait FileFormat

[GitHub] spark pull request #13107: [SPARK-13850] Force the sorter to Spill when numb...

2016-06-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13107#discussion_r65928927 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java --- @@ -143,6 +151,8 @@ private UnsafeExternalSorter

[GitHub] spark pull request #13107: [SPARK-13850] Force the sorter to Spill when numb...

2016-06-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13107#discussion_r65928685 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -113,7 +117,7 @@ // Use getSizeAsKb (not bytes

[GitHub] spark pull request #13107: [SPARK-13850] Force the sorter to Spill when numb...

2016-06-06 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13107#discussion_r65928165 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -72,7 +72,11 @@ private final TaskContext taskContext

[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

2016-06-06 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13107 @sitalkedia I think it will trigger Full GC and eventually spilling in that case, could you provide more information on that (stacktrace or logging)? --- If your project is set up for it, you can

[GitHub] spark issue #13501: [SPARK-15759] [SQL] Fallback to non-codegen when fail to...

2016-06-03 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13501 cc @rxin @marmbrus @sameeragarwal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13318: [SPARK-15391] [SQL] manage the temporary memory o...

2016-06-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r65754375 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -221,7 +221,8 @@ public BytesToBytesMap( SparkEnv.get

[GitHub] spark pull request #13318: [SPARK-15391] [SQL] manage the temporary memory o...

2016-06-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r65753944 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -221,7 +221,8 @@ public BytesToBytesMap( SparkEnv.get

[GitHub] spark pull request #13318: [SPARK-15391] [SQL] manage the temporary memory o...

2016-06-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r65753668 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java --- @@ -70,9 +70,14 @@ public int compare(PackedRecordPointer left

[GitHub] spark pull request #13501: [SPARK-15759] [SQL] Fallback to non-codegen when ...

2016-06-03 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13501 [SPARK-15759] [SQL] Fallback to non-codegen when fail to compile generated code ## What changes were proposed in this pull request? In case of any bugs in whole-stage codegen

[GitHub] spark pull request #13418: [SPARK-15677][SQL] Query with scalar sub-query in...

2016-06-01 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13418#discussion_r65441184 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1468,7 +1468,8 @@ object DecimalAggregates extends

[GitHub] spark issue #13443: [SPARK-15671] performance regression CoalesceRDD.pickBin...

2016-06-01 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13443 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13443: [SPARK-15671] performance regression CoalesceRDD.pickBin...

2016-06-01 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13443 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

<    1   2   3   4   5   6   7   8   9   10   >