[GitHub] spark issue #16875: [BACKPORT-2.1][SPARK-19512][SQL] codegen for compare str...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16875 @bogdanrdc can you close this? It won't auto close because it is not merged in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #16875: [BACKPORT-2.1][SPARK-19512][SQL] codegen for compare str...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16875 Merging in branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16864: [SPARK-19527][Core] Approximate Size of Intersect...

2017-02-10 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16864#discussion_r100503141 --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java --- @@ -81,6 +81,11 @@ int getVersionNumber() { public abstract

[GitHub] spark pull request #16887: [SPARK-19549] Allow providing reason for stage/jo...

2017-02-10 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16887#discussion_r100552370 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2207,20 +2207,22 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request #16887: [SPARK-19549] Allow providing reason for stage/jo...

2017-02-10 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16887#discussion_r100552660 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -696,9 +696,9 @@ class DAGScheduler( /** * Cancel a job that

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16664 Sorry I'm really confused, probably because I haven't kept track with this pr. But the diff doesn't match the pr description. Are we fixing a bug here or introducing a bunch of new

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

2017-02-10 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100564925 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

2017-02-10 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100565522 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala --- @@ -44,27 +44,50 @@ trait QueryExecutionListener

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

2017-02-10 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100565585 --- Diff: docs/sql-programming-guide.md --- @@ -1300,10 +1300,28 @@ Configuration of in-memory caching can be done using the `setConf` method on `Sp

[GitHub] spark issue #16887: [SPARK-19549] Allow providing reason for stage/job cance...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16887 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16885: Encryption of shuffle files

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16885 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16664 That's probably because you are not familiar with the SQL component. The existing API already has references to the QueryExecution object, which actually includes all of the information

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16664 Well it does. It contains the entire plan. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16664 I think that's a separate "bug" we should fix, i.e. DataFrameWriter should use InsertIntoDataSourceCommand so we can consolidate the two paths. --- If your project is set up for it, y

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16664 Basically I see no reason to add some specific parameter to a listener API that is meant to be generic which already contains reference to QueryExecution. What are you going to do if next time you

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16664 Actually @cloud-fan are you sure it is a problem right now? DataSOurce.write itself creates the commands, and if the information are propagated correctly, the QueryExecution object should have a

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16664 Yea we should fix that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16888: [SPARK-19552] [BUILD] Upgrade Netty version to 4.1.8 fin...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16888 Shouldn't we use netty-4.0.44.Final rather than 4.1.x? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #16888: [SPARK-19552] [BUILD] Upgrade Netty version to 4.1.8 fin...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16888 BTW for Netty we shouldn't just bump to the highest version. We should use the maintenance branches. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark issue #16887: [SPARK-19549] Allow providing reason for stage/job cance...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16887 Merging in master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100687458 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -48,69 +47,110 @@ class JacksonParser( // A

[GitHub] spark issue #16888: [WIP] [SPARK-19552] [BUILD] Upgrade Netty version to 4.1...

2017-02-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16888 Are there specific benefits brought by updating to 4.1 of Netty? Netty is so core to Spark that any bug in it would be extremely difficult to debug (yes we have founds bugs in Netty and helped fix

[GitHub] spark pull request #16872: [SPARK-19514] Making range interruptible.

2017-02-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16872#discussion_r100789955 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -127,4 +133,28 @@ class DataFrameRangeSuite extends QueryTest with

[GitHub] spark issue #16914: [SPARK-19514] Enhancing the test for Range interruption.

2017-02-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16914 LGTM pending jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16914: [SPARK-19514] Enhancing the test for Range interruption.

2017-02-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16914 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2017-02-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14426 @dongjoon-hyun do you have time to update the pull request now the view canonicalization work is done? Basically we can remove all the SQL generation stuff. --- If your project is set up for it

[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2017-02-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14426 Actually I have some time. I will submit a pr based on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #16925: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2017-02-14 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/16925 [SPARK-16475][SQL] Broadcast Hint for SQL Queries ## What changes were proposed in this pull request? This PR aims to achieve the following two goals in Spark SQL. 1. Generic Hint

[GitHub] spark issue #16925: [SPARK-16475][SQL] Broadcast Hint for SQL Queries - WIP

2017-02-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16925 Actually I'm going to completely rewrite this. I don't think the current implementation makes sense. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #16925: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2017-02-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16925#discussion_r101088496 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteHints.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed to the

[GitHub] spark issue #16925: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2017-02-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16925 cc @dongjoon-hyun, @cloud-fan, @gatorsmile and @hvanhovell This should be ready for review. Note that the semantics is different from the earlier versions. --- If your project is set up for it, you

[GitHub] spark pull request #16925: [SPARK-16475][SQL] Broadcast hint for SQL Queries

2017-02-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16925#discussion_r101129453 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/SubstituteHintsSuite.scala --- @@ -0,0 +1,123 @@ +/* + * Licensed to the

[GitHub] spark pull request #16925: [SPARK-16475][SQL] Broadcast hint for SQL Queries

2017-02-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16925#discussion_r101129594 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteHints.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the

[GitHub] spark pull request #16925: [SPARK-16475][SQL] Broadcast hint for SQL Queries

2017-02-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16925#discussion_r101129634 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteHints.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the

[GitHub] spark pull request #16925: [SPARK-16475][SQL] Broadcast hint for SQL Queries

2017-02-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16925#discussion_r101137229 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/SubstituteHintsSuite.scala --- @@ -0,0 +1,123 @@ +/* + * Licensed to the

[GitHub] spark issue #16925: [SPARK-16475][SQL] Broadcast hint for SQL Queries

2017-02-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16925 the latest commit hasn't finished running tests yet ... but probably fine given the small change. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark pull request #16939: [SPARK-16475][SQL] broadcast hint for SQL queries...

2017-02-15 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/16939 [SPARK-16475][SQL] broadcast hint for SQL queries - follow up ## What changes were proposed in this pull request? A small update to https://github.com/apache/spark/pull/16925 1. Rename

[GitHub] spark issue #16920: [MINOR][DOCS] Add jira url in pull request description

2017-02-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16920 Why is this necessary? It seems like an extra step needed and doesn't provide any real information. I suggest you use this: https://chrome.google.com/webstore/detail/j

[GitHub] spark issue #16940: [SPARK-19607] Finding QueryExecution that matches provid...

2017-02-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16940 LGTM (pending Jenkins). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16920: [MINOR][DOCS] Add jira url in pull request description

2017-02-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16920 Yea the only issue is that it requires another manual update. Why not use the chrome plugin I sent? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request #16925: [SPARK-16475][SQL] Broadcast hint for SQL Queries

2017-02-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16925#discussion_r101288304 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -374,6 +374,16 @@ querySpecification windows

[GitHub] spark pull request #16925: [SPARK-16475][SQL] Broadcast hint for SQL Queries

2017-02-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16925#discussion_r101289574 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -374,6 +374,16 @@ querySpecification windows

[GitHub] spark pull request #16925: [SPARK-16475][SQL] Broadcast hint for SQL Queries

2017-02-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16925#discussion_r101289645 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -374,6 +374,16 @@ querySpecification windows

[GitHub] spark issue #16940: [SPARK-19607] Finding QueryExecution that matches provid...

2017-02-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16940 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark pull request #16941: [SPARK-16475][SQL] broadcast hint for SQL queries...

2017-02-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16941#discussion_r101329235 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala --- @@ -524,7 +530,7 @@ class PlanParserSuite extends

[GitHub] spark issue #16941: [SPARK-16475][SQL] broadcast hint for SQL queries - disa...

2017-02-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16941 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #16943: [SPARK-19607][HOTFIX] Finding QueryExecution that matche...

2017-02-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16943 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark pull request #16956: [SPARK-19598][SQL]Remove the alias parameter in U...

2017-02-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16956#discussion_r101530187 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala --- @@ -54,10 +54,6 @@ object ResolveHints

[GitHub] spark pull request #16958: [SPARK-13721][SQL] Make GeneratorOuter unresolved...

2017-02-16 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/16958 [SPARK-13721][SQL] Make GeneratorOuter unresolved. ## What changes were proposed in this pull request? This is a small change to make GeneratorOuter always unresolved. It is mostly no-op change

[GitHub] spark issue #16958: [SPARK-13721][SQL] Make GeneratorOuter unresolved.

2017-02-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16958 cc @hvanhovell @bogdanrdc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16534 Change looks good to me but I didn't look super carefully. @holdenk can you take a look at this? --- If your project is set up for it, you can reply to this email and have your reply a

[GitHub] spark pull request #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support f...

2017-02-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16611#discussion_r101553890 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -97,6 +99,15 @@ class DataFrameReader private[sql](sparkSession

[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...

2017-02-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16611 For SQL, rather than "array", can we follow Python, e.g. ``` CREATE TEMPORARY TABLE tableA USING csv OPTIONS (nullValue ['NA', 'null'], ...) ``` ---

[GitHub] spark issue #16826: [WIP][SPARK-19540][SQL] Add ability to clone SparkSessio...

2017-02-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16826 What's WIP about this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wish

[GitHub] spark issue #16958: [SPARK-13721][SQL] Make GeneratorOuter unresolved.

2017-02-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16958 So nice when I got two LGTMs and then Jenkins disagreed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16960: [SPARK-19447] Make Range operator generate "recor...

2017-02-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16960#discussion_r101575199 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -309,4 +314,84 @@ class SQLMetricsSuite extends

[GitHub] spark pull request #16960: [SPARK-19447] Make Range operator generate "recor...

2017-02-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16960#discussion_r101575264 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -309,4 +314,84 @@ class SQLMetricsSuite extends

[GitHub] spark issue #16960: [SPARK-19447] Make Range operator generate "recordsRead"...

2017-02-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16960 cc @hvanhovell if you have a min to review this ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16960: [SPARK-19447] Make Range operator generate "recordsRead"...

2017-02-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16960 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #16977: [SPARK-19651][CORE] ParallelCollectionRDD.collect should...

2017-02-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16977 Are tests flaky right now? Otherwise it seems like this has introduced legitimate issue with the test timing out. Three times in a row. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request #17002: [SPARK-19669][SQL] Open up visibility for sharedS...

2017-02-20 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/17002 [SPARK-19669][SQL] Open up visibility for sharedState, sessionState, and a few other functions ## What changes were proposed in this pull request? To ease debugging, most of Spark SQL internals

[GitHub] spark pull request #17002: [SPARK-19669][SQL] Open up visibility for sharedS...

2017-02-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17002#discussion_r102070142 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -95,16 +95,26 @@ class SparkSession private( /** * State

[GitHub] spark issue #17002: [SPARK-19669][SQL] Open up visibility for sharedState, s...

2017-02-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17002 Yea @gatorsmile be careful in the future and check the commit hash. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #17049: [SPARK-17495] [SQL] Add more tests for hive hash

2017-02-23 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17049#discussion_r102881054 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala --- @@ -71,6 +75,242 @@ class HashExpressionsSuite

[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash

2017-02-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17049 Looks good except that comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #17053: [SPARK-18939][SQL] Timezone support in partition ...

2017-02-23 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17053#discussion_r102889140 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -251,7 +251,8 @@ abstract class ExternalCatalog

[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash

2017-02-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17049 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #16378: [SQL] Minor readability improvement for partition handli...

2016-12-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16378 cc @cloud-fan too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark pull request #16379: [SPARK-18969][SQL] Support grouping by nondetermi...

2016-12-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16379#discussion_r93560851 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministicSuite.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed

[GitHub] spark pull request #16381: [SPARK-18973][SQL] Remove SortPartitions and Redi...

2016-12-21 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/16381 [SPARK-18973][SQL] Remove SortPartitions and RedistributeData ## What changes were proposed in this pull request? SortPartitions and RedistributeData logical operators are not actually used and

[GitHub] spark issue #16381: [SPARK-18973][SQL] Remove SortPartitions and Redistribut...

2016-12-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16381 Please also merge this into branch-2.1 to minimize backport conflicts ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #16381: [SPARK-18973][SQL] Remove SortPartitions and Redi...

2016-12-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16381#discussion_r93564401 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/partitioning.scala --- @@ -1,49 +0,0 @@ -/* - * Licensed to the

[GitHub] spark issue #16380: [SPARK-18972][Core]Fix the netty thread names for RPC

2016-12-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16380 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #16381: [SPARK-18973][SQL] Remove SortPartitions and Redistribut...

2016-12-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16381 Note that this is code from the initial Spark SQL commit! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16378: [SQL] Minor readability improvement for partition handli...

2016-12-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16378 I've also cherry picked this into branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16349: [Doc] bucketing is applicable to all file-based data sou...

2016-12-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16349 Merging in master/branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #14627: [SPARK-16975][SQL][FOLLOWUP] Do not duplicately check fi...

2016-12-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14627 Thanks - merging in master/branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14627: [SPARK-16975][SQL][FOLLOWUP] Do not duplicately check fi...

2016-12-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14627 Actually there is a conflict. Does this fix any bug? If not we don't need to merge it in 2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on G

[GitHub] spark pull request #15923: [SPARK-4105] retry the fetch or stage if shuffle ...

2016-12-22 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15923#discussion_r93669807 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -305,40 +316,84 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark issue #16382: [SPARK-18975][Core] Add an API to remove SparkListener

2016-12-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16382 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #16382: [SPARK-18975][Core] Add an API to remove SparkListener

2016-12-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16382 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16371 sounds good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2016-12-22 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r93731259 --- Diff: python/pyspark/sql/readwriter.py --- @@ -155,21 +155,24 @@ def load(self, path=None, format=None, schema=None, **options): return

[GitHub] spark issue #16395: [SPARK-17075][SQL][WIP] implemented filter estimation

2016-12-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16395 cc @srinathshankar --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #16435: [SPARK-19027][SQL] estimate size of object buffer for ob...

2017-01-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16435 What exactly is the new policy? I don't think size in bytes is a good choice, since it's the number of objects that can destroy GC. --- If your project is set up for it, you can reply to

[GitHub] spark issue #16308: [SPARK-18936][SQL] Infrastructure for session local time...

2017-01-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16308 @hvanhovell anything else to do here other than bringing it up to date? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-04 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94708121 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends

[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2017-01-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16347 Maybe we should make DataFrameWriter.sortBy work here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94731500 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends

[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94732192 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends

[GitHub] spark issue #16475: [MINOR][CORE] Remove code duplication (so the interface ...

2017-01-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16475 Can we please close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2017-01-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16347 What I was suggesting was to allow sort by without bucketing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94859595 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends

[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery

2017-01-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16337 Go for it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work f...

2017-01-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16474#discussion_r94892001 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -593,13 +650,10 @@ object

[GitHub] spark pull request #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work f...

2017-01-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16474#discussion_r94892533 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -593,13 +650,10 @@ object

[GitHub] spark pull request #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work f...

2017-01-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16474#discussion_r94892571 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -543,6 +546,58 @@ object ParquetFileFormat

[GitHub] spark pull request #16430: [SPARK-17077] [SQL] Cardinality estimation for pr...

2017-01-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16430#discussion_r94901694 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/estimation/EstimationSuite.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16430: [SPARK-17077] [SQL] Cardinality estimation for pr...

2017-01-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16430#discussion_r94902084 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/estimation/EstimationUtils.scala --- @@ -0,0 +1,54

[GitHub] spark pull request #16430: [SPARK-17077] [SQL] Cardinality estimation for pr...

2017-01-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16430#discussion_r94902048 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/estimation/EstimationUtils.scala --- @@ -0,0 +1,54

<    12   13   14   15   16   17   18   19   20   21   >