[GitHub] spark pull request: [SPARK-2042] Prevent unnecessary shuffle trigg...

2014-06-11 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/1048 [SPARK-2042] Prevent unnecessary shuffle triggered by take() This PR implements `take()` on a `SchemaRDD` by inserting a logical limit that is followed by a `collect()`. This is also

[GitHub] spark pull request: [SPARK-2042] Prevent unnecessary shuffle trigg...

2014-06-11 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/1048#discussion_r13636591 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CombiningLimitsSuite.scala --- @@ -0,0 +1,71 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2042] Prevent unnecessary shuffle trigg...

2014-06-11 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/1048#discussion_r13636933 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala --- @@ -374,6 +374,9 @@ class SchemaRDD( override def collect

[GitHub] spark pull request: [SPARK-2042] Prevent unnecessary shuffle trigg...

2014-06-11 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/1048#discussion_r13661368 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala --- @@ -374,6 +374,9 @@ class SchemaRDD( override def collect

[GitHub] spark pull request: [SPARK-12662][SQL] Fix DataFrame.randomSplit t...

2016-01-06 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10626#discussion_r49044814 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala --- @@ -1062,10 +1062,17 @@ class DataFrame private[sql]( * @since 1.4.0

[GitHub] spark pull request: [SPARK-12662][SQL] Fix DataFrame.randomSplit t...

2016-01-06 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/10626 [SPARK-12662][SQL] Fix DataFrame.randomSplit to avoid creating overlapping splits https://issues.apache.org/jira/browse/SPARK-12662 cc @yhuai You can merge this pull request

[GitHub] spark pull request: [SPARK-12662][SQL] Fix DataFrame.randomSplit t...

2016-01-06 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10626#discussion_r49047415 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala --- @@ -62,6 +62,32 @@ class DataFrameStatSuite extends QueryTest

[GitHub] spark pull request: [SPARK-12662][SQL] Fix DataFrame.randomSplit t...

2016-01-06 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10626#discussion_r49047972 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala --- @@ -62,6 +62,32 @@ class DataFrameStatSuite extends QueryTest

[GitHub] spark pull request: [SPARK-12662][SQL] Fix DataFrame.randomSplit t...

2016-01-06 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10626#discussion_r49045970 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala --- @@ -62,6 +62,32 @@ class DataFrameStatSuite extends QueryTest

[GitHub] spark pull request: [SPARK-12662][SQL] Fix DataFrame.randomSplit t...

2016-01-08 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/10626#issuecomment-169929536 @gatorsmile it seems like your PR is changing the behavior of SQL intersect that this test relies on. I can take a closer look at the PR but if you think

[GitHub] spark pull request: [SPARK-12662][SQL] Fix DataFrame.randomSplit t...

2016-01-08 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/10626#issuecomment-169935761 @gatorsmile I pulled your changes and verified that the new intersect implementation fails even when there is a deterministic sampling operator in the plan, i.e

[GitHub] spark pull request #13318: [SPARK-15391] [SQL] manage the temporary memory o...

2016-06-02 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r65628155 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java --- @@ -70,9 +70,14 @@ public int compare(PackedRecordPointer

[GitHub] spark pull request #13318: [SPARK-15391] [SQL] manage the temporary memory o...

2016-06-02 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r65628414 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java --- @@ -70,9 +70,14 @@ public int compare(PackedRecordPointer

[GitHub] spark pull request #13318: [SPARK-15391] [SQL] manage the temporary memory o...

2016-06-02 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r65629505 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java --- @@ -70,9 +70,14 @@ public int compare(PackedRecordPointer

[GitHub] spark issue #13318: [SPARK-15391] [SQL] manage the temporary memory of timso...

2016-06-02 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13318 Looks great overall, just few questions and a couple of minor comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #13318: [SPARK-15391] [SQL] manage the temporary memory o...

2016-06-02 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r65629261 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -221,7 +221,8 @@ public BytesToBytesMap( SparkEnv.get

[GitHub] spark pull request #13489: [SPARK-15745][SQL] Use classloader's getResource(...

2016-06-02 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/13489 [SPARK-15745][SQL] Use classloader's getResource() for reading resource files in HiveTests ## What changes were proposed in this pull request? This is a cleaner approach in general

[GitHub] spark issue #13318: [SPARK-15391] [SQL] manage the temporary memory of timso...

2016-06-03 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13318 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13318: [SPARK-15391] [SQL] manage the temporary memory o...

2016-06-03 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r65757237 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -221,7 +221,8 @@ public BytesToBytesMap( SparkEnv.get

[GitHub] spark pull request #13566: [SPARK-15678] Add support to REFRESH data source ...

2016-06-08 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/13566 [SPARK-15678] Add support to REFRESH data source paths ## What changes were proposed in this pull request? Spark currently incorrectly continues to use cached data even

[GitHub] spark issue #13566: [SPARK-15678] Add support to REFRESH data source paths

2016-06-08 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13566 Thanks, I pulled it out in a separate function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13528: [SPARK-15783][CORE] still some flakiness in these blackl...

2016-06-06 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13528 @squito the last master build failed a number of tests on this commit: https://spark-tests.appspot.com/builds/spark-master-test-maven-hadoop-2.2/1206. Could those failures be related

[GitHub] spark issue #13528: [SPARK-15783][CORE] still some flakiness in these blackl...

2016-06-06 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13528 (most likely not; https://spark-tests.appspot.com/tests/org.apache.spark.deploy.LogUrlsStandaloneSuite/verify%20that%20log%20urls%20reflect%20SPARK_PUBLIC_DNS%20%28SPARK-6175%29 seems

[GitHub] spark issue #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScalarSubque...

2016-06-12 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13155 @hvanhovell maybe this broke the build? https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.0-compile-maven-scala-2.10/310/ --- If your project is set up

[GitHub] spark pull request #13161: [SPARK-14851] [Core] Support radix sort with null...

2016-06-10 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13161#discussion_r66648811 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java --- @@ -205,13 +210,14 @@ public static int

[GitHub] spark pull request #13161: [SPARK-14851] [Core] Support radix sort with null...

2016-06-10 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13161#discussion_r66650450 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java --- @@ -206,14 +216,27 @@ public void

[GitHub] spark pull request #13161: [SPARK-14851] [Core] Support radix sort with null...

2016-06-10 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13161#discussion_r66648782 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java --- @@ -205,13 +210,14 @@ public static int

[GitHub] spark pull request #13161: [SPARK-14851] [Core] Support radix sort with null...

2016-06-10 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13161#discussion_r66649155 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java --- @@ -93,6 +94,14 @@ public int compare

[GitHub] spark pull request #13161: [SPARK-14851] [Core] Support radix sort with null...

2016-06-10 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13161#discussion_r66650922 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java --- @@ -301,6 +323,19 @@ public SortedIterator

[GitHub] spark pull request #13566: [SPARK-15678] Add support to REFRESH data source ...

2016-06-10 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13566#discussion_r6584 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala --- @@ -226,4 +226,11 @@ abstract class Catalog { */ def

[GitHub] spark pull request #13161: [SPARK-14851] [Core] Support radix sort with null...

2016-06-10 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13161#discussion_r66653192 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala --- @@ -64,49 +64,57 @@ case class SortOrder(child

[GitHub] spark pull request #13161: [SPARK-14851] [Core] Support radix sort with null...

2016-06-10 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13161#discussion_r66653864 --- Diff: core/src/test/scala/org/apache/spark/util/collection/unsafe/sort/RadixSortSuite.scala --- @@ -152,7 +152,7 @@ class RadixSortSuite extends

[GitHub] spark pull request #13161: [SPARK-14851] [Core] Support radix sort with null...

2016-06-10 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13161#discussion_r66653788 --- Diff: core/src/test/scala/org/apache/spark/util/collection/unsafe/sort/RadixSortSuite.scala --- @@ -152,7 +152,7 @@ class RadixSortSuite extends

[GitHub] spark pull request #13566: [SPARK-15678] Add support to REFRESH data source ...

2016-06-10 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13566#discussion_r6348 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -157,4 +161,49 @@ private[sql] class CacheManager extends

[GitHub] spark pull request #13419: [SPARK-15678][SQL] Not use cache on appends and o...

2016-06-10 Thread sameeragarwal
Github user sameeragarwal closed the pull request at: https://github.com/apache/spark/pull/13419 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13419: [SPARK-15678][SQL] Not use cache on appends and overwrit...

2016-06-10 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13419 I ended up creating a small design doc describing the problem and presenting 2 possible solutions at https://docs.google.com/document/d/1h5SzfC5UsvIrRpeLNDKSMKrKJvohkkccFlXo-GBAwQQ/edit?ts

[GitHub] spark pull request #13566: [SPARK-15678] Add support to REFRESH data source ...

2016-06-10 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13566#discussion_r66685151 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala --- @@ -226,4 +226,11 @@ abstract class Catalog { */ def

[GitHub] spark issue #13161: [SPARK-14851] [Core] Support radix sort with nullable lo...

2016-06-10 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13161 LGTM pending @davies's approval for `SortOrder.scala` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13419: [SPARK-15678][SQL] Not use cache on appends and overwrit...

2016-06-03 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13419 @tejasapatil if the nodes where the data was cached go down, the CacheManager should still consider that data as cached. In that case, the next time the data is accessed, the underlying RDD

[GitHub] spark issue #13489: [SPARK-15745][SQL] Use classloader's getResource() for r...

2016-06-03 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13489 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request: [SPARK-15327] [SQL] fix split expression in whole stage ...

2016-05-31 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/13235 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request: [SPARK-15678][SQL] Drop cache on appends and overwrites

2016-05-31 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/13419 @dongjoon-hyun no reason; old habits. I'll fix this. Thanks! :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15601][CORE] CircularBuffer's toString() to print...

2016-05-31 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/13351 Thanks Tejas! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-15533][SQL] Deprecate Dataset.explode

2016-05-25 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/13312 [SPARK-15533][SQL] Deprecate Dataset.explode ## What changes were proposed in this pull request? This patch deprecates `Dataset.explode` and documents appropriate workarounds to use

[GitHub] spark pull request: [SPARK-15533][SQL] Deprecate Dataset.explode

2016-05-25 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/13312#issuecomment-221749009 Removed labels and converted the examples to use the alternatives. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-15678][SQL] Not use cache on appends and overwrit...

2016-05-31 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/13419 @mengxr it seems like overwriting generates new files so we can achieve the same semantics without introducing an additional timestamp. The current solution should respect the contract

[GitHub] spark pull request: [SPARK-15678][SQL] Not use cache on appends and overwrit...

2016-05-31 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/13419 Also cc'ing @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13466: [SPARK-14752][SQL] Explicitly implement KryoSeria...

2016-06-02 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/13466 [SPARK-14752][SQL] Explicitly implement KryoSerialization for LazilyGenerateOrdering ## What changes were proposed in this pull request? This patch fixes a number

[GitHub] spark pull request: [SPARK-8428][SPARK-13850] Fix integer overflow...

2016-05-26 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/13336 [SPARK-8428][SPARK-13850] Fix integer overflows in TimSort ## What changes were proposed in this pull request? This patch fixes a few integer overflows

[GitHub] spark pull request: [SPARK-8428][SPARK-13850] Fix integer overflow...

2016-05-26 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/13336#issuecomment-221990354 cc @davies @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [BUILD][1.6] Fix compilation

2016-05-26 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/13339#issuecomment-222024750 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [BUILD][1.6] Fix compilation

2016-05-26 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/13339 [BUILD][1.6] Fix compilation ## What changes were proposed in this pull request? Makes `UnsafeSortDataFormat` and `RecordPointerAndKeyPrefix` public. These are already public

[GitHub] spark pull request: [BUILD][1.6] Fix compilation

2016-05-26 Thread sameeragarwal
Github user sameeragarwal closed the pull request at: https://github.com/apache/spark/pull/13339 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request: [SPARK-15599][SQL][DOCS] API docs for `createD...

2016-05-26 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/13345 [SPARK-15599][SQL][DOCS] API docs for `createDataset` functions in SparkSession ## What changes were proposed in this pull request? Adds API docs and usage examples for the 3

[GitHub] spark pull request: [SPARK-15599][SQL][DOCS] API docs for `createD...

2016-05-26 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/13345#issuecomment-222043997 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14400] [SQL] ScriptTransformation does ...

2016-05-27 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/12194#issuecomment-222080647 LGTM pending jenkins. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-15599][SQL][DOCS] API docs for `createD...

2016-05-27 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/13345#issuecomment-222069015 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-15678][SQL] Drop cache on appends and overwrites

2016-05-31 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/13419 @yhuai @mengxr what are your thoughts on this approach? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-15678][SQL] Drop cache on appends and overwrites

2016-05-31 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/13419 [SPARK-15678][SQL] Drop cache on appends and overwrites ## What changes were proposed in this pull request? SparkSQL currently doesn't drop caches if the underlying data

[GitHub] spark pull request #13832: [SPARK-16123] Avoid NegativeArraySizeException wh...

2016-06-21 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/13832 [SPARK-16123] Avoid NegativeArraySizeException while reserving additional capacity in VectorizedColumnReader ## What changes were proposed in this pull request? This patch fixes

[GitHub] spark issue #13832: [SPARK-16123] Avoid NegativeArraySizeException while res...

2016-06-21 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13832 cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #13726: Remove non-obvious conf settings from TPCDS benchmark

2016-06-16 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13726 cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #13726: Remove non-obvious conf settings from TPCDS bench...

2016-06-16 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/13726 Remove non-obvious conf settings from TPCDS benchmark ## What changes were proposed in this pull request? My fault -- these 2 conf entries are mysteriously hidden inside

[GitHub] spark issue #13737: [SPARK-15954][SQL][PySpark][TEST] Fix TestHiveContext in...

2016-06-19 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13737 Thanks, LGTM. It'd be great to add a comment here about the fallback and its implications on pyspark tests to prevent future regressions. Also, out of curiosity, do we know what

[GitHub] spark pull request #13832: [SPARK-16123] Avoid NegativeArraySizeException wh...

2016-06-23 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13832#discussion_r68334002 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java --- @@ -847,6 +862,11 @@ public final int appendStruct

[GitHub] spark pull request: [SPARK-12682] Add support for (optionally) not...

2016-01-18 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/10826 [SPARK-12682] Add support for (optionally) not storing tables in hive metadata format This PR adds a new table option (`skip_hive_metadata`) that'd allow the user to skip storing the table

[GitHub] spark pull request: [SPARK-12594] [SQL] Outer Join Elimination by ...

2016-02-10 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10567#discussion_r52501622 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -769,6 +770,79 @@ object ReorderJoin extends

[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

2016-02-10 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10566#discussion_r52503590 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -769,6 +770,107 @@ object ReorderJoin extends

[GitHub] spark pull request: [SPARK-13091][SQL] Rewrite/Propagate constrain...

2016-02-09 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/11144#discussion_r52399068 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala --- @@ -69,6 +72,20 @@ class

[GitHub] spark pull request: [SPARK-13091][SQL] Rewrite/Propagate constrain...

2016-02-09 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/11144 [SPARK-13091][SQL] Rewrite/Propagate constraints for Aliases This PR adds support for rewriting constraints if there are aliases in the query plan. For e.g., if there is a query of form

[GitHub] spark pull request: [SPARK-13091][SQL] Rewrite/Propagate constrain...

2016-02-09 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11144#issuecomment-182150683 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-01 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51490255 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -180,6 +221,46 @@ case class Join

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-01 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51506200 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -89,9 +89,27 @@ case class Generate

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-01 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51506264 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala --- @@ -0,0 +1,138

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-01 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-178280862 @marmbrus comments addressed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-01 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51506239 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -180,6 +221,46 @@ case class Join

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-01 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-178291758 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-02 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51647871 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -17,16 +17,62 @@ package

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-02 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-178868815 comments addressed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-02 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51647661 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -157,6 +178,23 @@ case class Union

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-02 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51647631 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -301,10 +301,12 @@ abstract class LeafNode

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-01-29 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-177015688 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-01-29 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51338060 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -17,16 +17,34 @@ package

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-01-29 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-177051182 Thanks @marmbrus, all comments addressed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-01-29 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51338405 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -180,6 +229,55 @@ case class Join

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-01-29 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51338436 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -180,6 +229,55 @@ case class Join

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-01-29 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-176948895 Thanks @marmbrus, all comments addressed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188922487 Generated code: ```java /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIterator(references); /* 003

[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-24 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/11359 [SPARK-13123][SQL] Implement whole state codegen for sort ## What changes were proposed in this pull request? This just builds on @nongli 's PR: https://github.com/apache/spark/pull

[GitHub] spark pull request: [WIP][SPARK-13495][SQL] Add Null Filters in th...

2016-02-25 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/11372 [WIP][SPARK-13495][SQL] Add Null Filters in the query plan for Filters/Joins based on their data constraints ## What changes were proposed in this pull request? This PR adds

[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188933507 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-25 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-188686409 Thanks @rxin, added! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [WIP][SPARK-13495][SQL] Add Null Filters in th...

2016-02-29 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/11372#discussion_r54460830 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -586,6 +588,62 @@ object NullPropagation extends

[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-29 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/11359#discussion_r54458793 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala --- @@ -93,4 +97,75 @@ case class Sort( sortedIterator

[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-29 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/11359#discussion_r54458824 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala --- @@ -93,4 +97,75 @@ case class Sort( sortedIterator

[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-27 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11359#issuecomment-189683500 @nongli this should be ready for your pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-21 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-173787321 @hvanhovell added, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-12682][SQL] Add support for (optionally...

2016-01-25 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/10826#issuecomment-174654941 Thanks @yhuai, all comments addressed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12682][SQL] Add support for (optionally...

2016-01-25 Thread sameeragarwal
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/10826#issuecomment-174665294 jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [WIP][SQL] Initial support for constraint prop...

2016-01-20 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/10844 [WIP][SQL] Initial support for constraint propagation in SparkSQL You can merge this pull request into a Git repository by running: $ git pull https://github.com/sameeragarwal/spark

  1   2   3   4   5   6   7   >