[GitHub] spark issue #17289: [SPARK-19948] Document that saveAsTable uses catalog as ...

2017-03-15 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/17289 @gatorsmile It relates to JDBC source, because if you do e.g. .mode(SaveMode.ErrorIfExists).saveAsTable("foo") on a JDBC source, and the table exists in JDBC but does

[GitHub] spark pull request #17707: [SPARK-20412] Throw ParseException from visitNonO...

2017-04-21 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/17707#discussion_r112672271 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLCommandSuite.scala --- @@ -558,12 +558,12 @@ class DDLCommandSuite

[GitHub] spark pull request #17703: [SPARK-20367] Properly unescape column names of p...

2017-04-20 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/17703 [SPARK-20367] Properly unescape column names of partitioning columns parsed from paths. ## What changes were proposed in this pull request? When infering partitioning schema from

[GitHub] spark issue #17703: [SPARK-20367] Properly unescape column names of partitio...

2017-04-20 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/17703 cc @gatorsmile @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #17707: [SPARK-20412] Throw ParseException from visitNonO...

2017-04-20 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/17707 [SPARK-20412] Throw ParseException from visitNonOptionalPartitionSpec instead of returning null values. ## What changes were proposed in this pull request? If a partitionSpec

[GitHub] spark issue #17707: [SPARK-20412] Throw ParseException from visitNonOptional...

2017-04-20 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/17707 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #17289: [SPARK-19948] Document that saveAsTable uses cata...

2017-03-14 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/17289 [SPARK-19948] Document that saveAsTable uses catalog as source of truth for table existence. It is quirky behaviour that saveAsTable to e.g. a JDBC source with SaveMode other than

[GitHub] spark pull request #18937: [MINOR] Remove false comment from planStreamingAg...

2017-08-14 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/18937 [MINOR] Remove false comment from planStreamingAggregation ## What changes were proposed in this pull request? AggUtils.planStreamingAggregation has some comments about DISTINCT

[GitHub] spark pull request #18479: [SPARK-21273][SQL] Propagate logical plan stats u...

2017-07-17 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/18479#discussion_r127685579 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlanVisitor.scala --- @@ -0,0 +1,87

[GitHub] spark pull request #18479: [SPARK-21273][SQL] Propagate logical plan stats u...

2017-07-17 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/18479#discussion_r127680940 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/BasicStatsPlanVisitor.scala --- @@ -0,0 +1,82

[GitHub] spark pull request #18479: [SPARK-21273][SQL] Propagate logical plan stats u...

2017-07-17 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/18479#discussion_r127680994 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/BasicStatsPlanVisitor.scala --- @@ -0,0 +1,82

[GitHub] spark pull request #18494: [SPARK-21272] SortMergeJoin LeftAnti does not upd...

2017-06-30 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/18494 [SPARK-21272] SortMergeJoin LeftAnti does not update numOutputRows ## What changes were proposed in this pull request? Updating numOutputRows metric was missing from one return

[GitHub] spark pull request #17875: [SPARK-20616] RuleExecutor logDebug of batch resu...

2017-05-05 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/17875 [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch ## What changes were proposed in this pull request? Due to a likely typo, the logDebug msg

[GitHub] spark issue #19181: [SPARK-21907][CORE] oom during spill

2017-09-15 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/19181 @eyalfa Thanks. I agree that this is a good fix to this issue and lgtm. I'm just worried that there are more lurking cases where a nested spill can trigger and cause something

[GitHub] spark issue #19181: [SPARK-21907][CORE] oom during spill

2017-09-14 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/19181 LGTM. It'd be nicer if there was an invariant that a nested spill just never can happen, i.e. there are no places that can spill on the code path that already spills. But this fix

[GitHub] spark pull request #19324: [SPARK-22103] Move HashAggregateExec parent consu...

2017-09-22 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/19324 [SPARK-22103] Move HashAggregateExec parent consume to a separate function in codegen ## What changes were proposed in this pull request? HashAggregateExec codegen uses two paths

[GitHub] spark pull request #19324: [SPARK-22103] Move HashAggregateExec parent consu...

2017-09-22 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19324#discussion_r140532783 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -462,18 +464,36 @@ case class

[GitHub] spark issue #19324: [SPARK-22103] Move HashAggregateExec parent consume to a...

2017-09-22 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/19324 @viirya This is related to https://github.com/apache/spark/pull/18931/, as it also separates out the consume function. Maybe it would be enough to do similar splits into functions

[GitHub] spark issue #19324: [SPARK-22103] Move HashAggregateExec parent consume to a...

2017-09-22 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/19324 @hvanhovell @gatorsmile @cloud-fan @rednaxelafx --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19324: [SPARK-22103] Move HashAggregateExec parent consu...

2017-09-22 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19324#discussion_r140528883 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -329,6 +332,15 @@ case class

[GitHub] spark pull request #19324: [SPARK-22103] Move HashAggregateExec parent consu...

2017-09-22 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19324#discussion_r140528662 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -599,10 +621,14 @@ case class

[GitHub] spark issue #19181: [SPARK-21907][CORE] oom during spill

2017-10-10 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/19181 Looks good to me. What do you think @hvanhovell ? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19353: [SPARK-22103][FOLLOWUP] Rename addExtraCode to addInnerC...

2017-09-26 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/19353 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #19324: [SPARK-22103] Move HashAggregateExec parent consu...

2017-09-26 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19324#discussion_r141039264 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala --- @@ -186,8 +186,7 @@ case class

[GitHub] spark pull request #19324: [SPARK-22103] Move HashAggregateExec parent consu...

2017-09-26 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19324#discussion_r141037818 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -462,18 +464,36 @@ case class

[GitHub] spark pull request #19324: [SPARK-22103] Move HashAggregateExec parent consu...

2017-09-26 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19324#discussion_r141037588 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -462,18 +464,36 @@ case class

[GitHub] spark pull request #19353: [SPARK-22103][FOLLOWUP] Rename addExtraCode to ad...

2017-09-26 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/19353 [SPARK-22103][FOLLOWUP] Rename addExtraCode to addInnerClass ## What changes were proposed in this pull request? Address PR comments that appeared post-merge, to rename

[GitHub] spark issue #19353: [SPARK-22103][FOLLOWUP] Rename addExtraCode to addInnerC...

2017-09-26 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/19353 @cloud-fan @viirya I addressed the post-merge comments you had in https://github.com/apache/spark/pull/19324

[GitHub] spark pull request #19324: [SPARK-22103] Move HashAggregateExec parent consu...

2017-09-26 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19324#discussion_r141039074 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -201,11 +201,14 @@ case class FilterExec

[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19181#discussion_r141843414 --- Diff: core/src/test/scala/org/apache/spark/memory/TestMemoryManager.scala --- @@ -58,11 +58,15 @@ class TestMemoryManager(conf: SparkConf

[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19181#discussion_r141843276 --- Diff: core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorterSuite.java --- @@ -139,4 +139,44 @@ public int

[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19181#discussion_r141843447 --- Diff: core/src/test/scala/org/apache/spark/memory/TestMemoryManager.scala --- @@ -58,11 +58,15 @@ class TestMemoryManager(conf: SparkConf

[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19181#discussion_r141842424 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java --- @@ -85,7 +85,7 @@ private final

[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19181#discussion_r141842918 --- Diff: core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java --- @@ -503,6 +511,39 @@ public void

[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19181#discussion_r141843193 --- Diff: core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorterSuite.java --- @@ -139,4 +139,44 @@ public int

[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19181#discussion_r141843914 --- Diff: core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java --- @@ -19,10 +19,18

[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19181#discussion_r141843227 --- Diff: core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorterSuite.java --- @@ -139,4 +139,44 @@ public int

[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19181#discussion_r141843046 --- Diff: core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java --- @@ -503,6 +511,39 @@ public void

[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19181#discussion_r141842948 --- Diff: core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java --- @@ -503,6 +511,39 @@ public void

[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19181#discussion_r141843494 --- Diff: core/src/test/scala/org/apache/spark/memory/TestMemoryManager.scala --- @@ -58,11 +58,15 @@ class TestMemoryManager(conf: SparkConf

[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19181#discussion_r141842522 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java --- @@ -162,14 +162,25 @@ private int

[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19181#discussion_r141842730 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java --- @@ -162,14 +162,25 @@ private int

[GitHub] spark issue #19386: [SPARK-22161] [SQL] Add Impala-modified TPC-DS queries

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/19386 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19324: [SPARK-22103] Move HashAggregateExec parent consu...

2017-09-26 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19324#discussion_r141033249 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -197,11 +197,14 @@ trait CodegenSupport

[GitHub] spark pull request #19324: [SPARK-22103] Move HashAggregateExec parent consu...

2017-09-26 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/19324#discussion_r141035313 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -654,18 +680,23 @@ case class

[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-06 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/19656 `isShouldStopRequired` is currently not respected by most operators (they insert `shouldStop()` code regardless of this setting). If you're refactoring this, maybe make sure that all

[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...

2017-11-09 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/19689 Thanks for the fix @viirya! But I'm not a Spark committer to approve it :-). --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #19923: [SPARK-22721] BytesToBytesMap peak memory not updated.

2017-12-07 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/19923 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #19923: [SPARK-22721] BytesToBytesMap peak memory not upd...

2017-12-07 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/19923 [SPARK-22721] BytesToBytesMap peak memory not updated. ## What changes were proposed in this pull request? Follow-up to earlier commit. The peak memory of BytesToBytesMap

[GitHub] spark issue #19923: [SPARK-22721] BytesToBytesMap peak memory not updated.

2017-12-07 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/19923 Sorry @hvanhovell for not getting it fully right the first time... --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19915: [SPARK-22721] BytesToBytesMap peak memory usage n...

2017-12-06 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/19915 [SPARK-22721] BytesToBytesMap peak memory usage not accurate after reset() ## What changes were proposed in this pull request? BytesToBytesMap doesn't update peak memory usage

[GitHub] spark issue #19915: [SPARK-22721] BytesToBytesMap peak memory usage not accu...

2017-12-06 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/19915 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21133: [SPARK-24013][SQL] Remove unneeded compress in Ap...

2018-04-27 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/21133#discussion_r184654132 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala --- @@ -238,12 +238,6

[GitHub] spark pull request #21133: [SPARK-24013][SQL] Remove unneeded compress in Ap...

2018-04-27 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/21133#discussion_r184656896 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala --- @@ -279,4 +282,11 @@ class

[GitHub] spark pull request #21133: [SPARK-24013][SQL] Remove unneeded compress in Ap...

2018-04-27 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/21133#discussion_r184662359 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala --- @@ -279,4 +282,11 @@ class

[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

2018-05-29 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/21403 @mgaido91 BTW: In SPARK-24395 I would consider the cases to still be valid, because I believe there is no other syntactic way to do a multi-column IN/NOT IN with list of literals

[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

2018-05-29 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/21403 @mgaido91 This also works, +1. What about `a in (select (b, c) from ...)` when `a` is a struct? - I guess allow it, but a potential gotcha during implementation

[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

2018-05-29 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/21403 I think that the way the columns are defined in the subquery should define the semantics. E.g.: `(a, b) IN (select c, d from ...)` - unpack (a, b) and treat it as a multi column

[GitHub] spark pull request #21171: [SPARK-24104] SQLAppStatusListener overwrites met...

2018-04-26 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/21171 [SPARK-24104] SQLAppStatusListener overwrites metrics onDriverAccumUpdates instead of updating them ## What changes were proposed in this pull request? Event

[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...

2018-04-26 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/21171 cc @gengliangwang @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #21133: [SPARK-24013][SQL] Remove unneeded compress in Approxima...

2018-04-30 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/21133 Maybe we could add the former test as a benchmark to `AggregateBenchmark`? --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #21133: [SPARK-24013][SQL] Remove unneeded compress in Ap...

2018-04-26 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/21133#discussion_r184347998 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala --- @@ -238,12 +238,6

[GitHub] spark pull request #21133: [SPARK-24013][SQL] Remove unneeded compress in Ap...

2018-04-26 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/21133#discussion_r184343803 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala --- @@ -279,4 +282,10 @@ class

[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

2018-05-03 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/21228#discussion_r185792840 --- Diff: R/pkg/R/functions.R --- @@ -3184,6 +3191,7 @@ setMethod("create_map", #' collect(select(df2, collect_lis

[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

2018-05-03 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/21228#discussion_r185791719 --- Diff: R/pkg/R/functions.R --- @@ -963,6 +964,7 @@ setMethod("kurtosis", #' last(df$c, TRUE) #' } #' @note last s

[GitHub] spark pull request #20136: [SPARK-22938] Assert that SQLConf.get is accessed...

2018-01-03 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20136#discussion_r159408791 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -70,7 +72,7 @@ object SQLConf { * Default config

[GitHub] spark pull request #20555: [SPARK-23366] Improve hot reading path in ReadAhe...

2018-02-08 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/20555 [SPARK-23366] Improve hot reading path in ReadAheadInputStream ## What changes were proposed in this pull request? `ReadAheadInputStream` was introduced in https://github.com

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-08 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/20555 cc @kiszk @sitalkedia @zsxwing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20555: [SPARK-23366] Improve hot reading path in ReadAhe...

2018-02-12 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20555#discussion_r167651037 --- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java --- @@ -232,7 +229,9 @@ private void waitForAsyncReadComplete() throws

[GitHub] spark pull request #20555: [SPARK-23366] Improve hot reading path in ReadAhe...

2018-02-12 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20555#discussion_r167646954 --- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java --- @@ -258,54 +262,43 @@ public int read(byte[] b, int offset, int len

[GitHub] spark pull request #20555: [SPARK-23366] Improve hot reading path in ReadAhe...

2018-02-13 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20555#discussion_r167971273 --- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java --- @@ -258,54 +263,43 @@ public int read(byte[] b, int offset, int len

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-13 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/20555 @jiangxb1987 there is ReadAheadInputStreamSuite that extends GenericFileInputStreamSuite. I updated it and added more combination testing with different buffer sizes that should

[GitHub] spark pull request #20555: [SPARK-23366] Improve hot reading path in ReadAhe...

2018-02-13 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20555#discussion_r168048795 --- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java --- @@ -230,24 +227,32 @@ private void signalAsyncReadComplete

[GitHub] spark pull request #20555: [SPARK-23366] Improve hot reading path in ReadAhe...

2018-02-13 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20555#discussion_r168048937 --- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java --- @@ -78,9 +79,8 @@ // whether there is a read ahead task

[GitHub] spark pull request #20624: [SPARK-23445] ColumnStat refactoring

2018-02-15 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/20624 [SPARK-23445] ColumnStat refactoring ## What changes were proposed in this pull request? Refactor ColumnStat to be more flexible. * Split `ColumnStat

[GitHub] spark issue #20624: [SPARK-23445] ColumnStat refactoring

2018-02-15 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/20624 cc @gatorsmile @cloud-fan @marmbrus --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20624: [SPARK-23445] ColumnStat refactoring

2018-02-22 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20624#discussion_r170086812 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -387,6 +390,101 @@ case class

[GitHub] spark pull request #20624: [SPARK-23445] ColumnStat refactoring

2018-02-25 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20624#discussion_r170463362 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -387,6 +390,101 @@ case class

[GitHub] spark pull request #20624: [SPARK-23445] ColumnStat refactoring

2018-02-25 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20624#discussion_r170465726 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala --- @@ -305,15 +260,15 @@ object ColumnStat

[GitHub] spark pull request #20624: [SPARK-23445] ColumnStat refactoring

2018-02-25 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20624#discussion_r170465836 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -387,6 +390,101 @@ case class

[GitHub] spark pull request #20624: [SPARK-23445] ColumnStat refactoring

2018-02-25 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20624#discussion_r170462960 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -387,6 +390,101 @@ case class

[GitHub] spark pull request #20624: [SPARK-23445] ColumnStat refactoring

2018-02-25 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20624#discussion_r170465448 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -387,6 +390,101 @@ case class

[GitHub] spark issue #20136: [SPARK-22938] Assert that SQLConf.get is accessed only o...

2018-01-02 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/20136 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20136: [SPARK-22938] Assert that SQLConf.get is accessed...

2018-01-02 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/20136 [SPARK-22938] Assert that SQLConf.get is accessed only on the driver. ## What changes were proposed in this pull request? Assert if code tries to access SQLConf.get on executor

[GitHub] spark pull request #20152: [SPARK-22957] ApproxQuantile breaks if the number...

2018-01-04 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/20152 [SPARK-22957] ApproxQuantile breaks if the number of rows exceeds MaxInt ## What changes were proposed in this pull request? 32bit Int was used for row rank. That overflowed

[GitHub] spark issue #20152: [SPARK-22957] ApproxQuantile breaks if the number of row...

2018-01-04 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/20152 If the serialized form change is a problem, that part can probably be reverted - it's far less likely that a single compressed stats chunk will overflow Int. The bug I hit

[GitHub] spark pull request #22268: [DOC] Fix comment on SparkPlanGraphEdge

2018-08-29 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/22268 [DOC] Fix comment on SparkPlanGraphEdge ## What changes were proposed in this pull request? `fromId` is the child, and `toId` is the parent, see line 127

[GitHub] spark issue #22268: [DOC] Fix comment on SparkPlanGraphEdge

2018-08-29 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/22268 cc @gatorsmile just a tiny nit... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-07-24 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/21403 Looks good to me, though I'm not very familiar with analyzer. @cloud-fan, @hvanhovell ? --- - To unsubscribe, e

[GitHub] spark pull request #22286: [SPARK-25284] Spark UI: make sure skipped stages ...

2018-08-30 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/22286 [SPARK-25284] Spark UI: make sure skipped stages are updated onJobEnd ## What changes were proposed in this pull request? Tiny bug in onJobEnd, not forcing the refresh of skipped

[GitHub] spark issue #22286: [SPARK-25284] Spark UI: make sure skipped stages are upd...

2018-08-30 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/22286 cc @gatorsmile fyi @dbkerkela --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22286: [SPARK-25284] Spark UI: make sure skipped stages are upd...

2018-08-31 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/22286 Thanks pointing to #22209. Closing this one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22286: [SPARK-25284] Spark UI: make sure skipped stages ...

2018-08-31 Thread juliuszsompolski
Github user juliuszsompolski closed the pull request at: https://github.com/apache/spark/pull/22286 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

2018-08-31 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/22209#discussion_r214320305 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala --- @@ -350,11 +350,20 @@ private[spark] class AppStatusListener

[GitHub] spark pull request #20679: [SPARK-23514] Use SessionState.newHadoopConf() to...

2018-02-27 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20679#discussion_r170981204 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala --- @@ -115,7 +115,9 @@ private[sql] class SessionState

[GitHub] spark issue #20679: [SPARK-23514] Use SessionState.newHadoopConf() to propag...

2018-02-27 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/20679 jenkins retest this please Flaky: `sbt.ForkMain$ForkError: java.io.IOException: Failed to delete: /home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-c0f7c3f9-f48a-4bb8

[GitHub] spark issue #20718: [SPARK-23514][FOLLOW-UP] Remove more places using sparkC...

2018-03-02 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/20718 cc @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20718: [SPARK-23514][FOLLOW-UP] Remove more places using...

2018-03-02 Thread juliuszsompolski
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/20718 [SPARK-23514][FOLLOW-UP] Remove more places using sparkContext.hadoopConfiguration directly ## What changes were proposed in this pull request? In https://github.com/apache/spark

[GitHub] spark issue #20718: [SPARK-23514][FOLLOW-UP] Remove more places using sparkC...

2018-03-02 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/20718 jenkins retest this please `hudson.plugins.git.GitException: Failed to fetch from https://github.com/apache/spark.git

[GitHub] spark issue #20718: [SPARK-23514][FOLLOW-UP] Remove more places using sparkC...

2018-03-02 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/20718 jenkins retest this please ``` org.apache.spark.sql.FileBasedDataSourceSuite.(It is not a test it is a sbt.testing.SuiteSelector

  1   2   >