[GitHub] spark pull request #16872: [SPARK-19514] Making range interruptible.

2017-02-09 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/16872 [SPARK-19514] Making range interruptible. ## What changes were proposed in this pull request? Previously range operator could not be interrupted. For example, using DAGScheduler.cancelStage

[GitHub] spark issue #16872: [SPARK-19514] Making range interruptible.

2017-02-09 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/16872 @hvanhovell @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark pull request #16872: [SPARK-19514] Making range interruptible.

2017-02-09 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/16872#discussion_r100316320 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -127,4 +133,28 @@ class DataFrameRangeSuite extends QueryTest with

[GitHub] spark pull request #16872: [SPARK-19514] Making range interruptible.

2017-02-09 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/16872#discussion_r100347811 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -127,4 +133,28 @@ class DataFrameRangeSuite extends QueryTest with

[GitHub] spark pull request #16887: [SPARK-19549] Allow providing reason for stage/jo...

2017-02-10 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/16887 [SPARK-19549] Allow providing reason for stage/job cancelling ## What changes were proposed in this pull request? This change add an optional argument to `SparkContext.cancelStage()` and

[GitHub] spark issue #16887: [SPARK-19549] Allow providing reason for stage/job cance...

2017-02-10 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/16887 @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #16872: [SPARK-19514] Making range interruptible.

2017-02-13 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/16872#discussion_r100764417 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -127,4 +133,28 @@ class DataFrameRangeSuite extends QueryTest with

[GitHub] spark pull request #16872: [SPARK-19514] Making range interruptible.

2017-02-13 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/16872#discussion_r100825479 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -127,4 +133,28 @@ class DataFrameRangeSuite extends QueryTest with

[GitHub] spark pull request #16914: [SPARK-19514] Enhancing the test for Range interr...

2017-02-13 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/16914 [SPARK-19514] Enhancing the test for Range interruption. Improve the test for SPARK-19514, so that it's clear which stage is being cancelled. You can merge this pull request into a Git reposito

[GitHub] spark issue #16914: [SPARK-19514] Enhancing the test for Range interruption.

2017-02-13 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/16914 @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #16940: [SPARK-19607] Finding QueryExecution that matches...

2017-02-15 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/16940 [SPARK-19607] Finding QueryExecution that matches provided executionId ## What changes were proposed in this pull request? Implementing a mapping between executionId and corresponding

[GitHub] spark issue #16940: [SPARK-19607] Finding QueryExecution that matches provid...

2017-02-15 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/16940 @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #16960: [SPARK-19447] Make Range operator generate "recor...

2017-02-16 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/16960 [SPARK-19447] Make Range operator generate "recordsRead" metric ## What changes were proposed in this pull request? The Range was modified to produce "recordsRead&quo

[GitHub] spark pull request #16713: [SC-5550] Automatic killing of tasks that are pro...

2017-01-26 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/16713 [SC-5550] Automatic killing of tasks that are producing too many output rows ## What changes were proposed in this pull request? This change implements TaskOutputListener, which continuously

[GitHub] spark pull request #16713: [SC-5550] Automatic killing of tasks that are pro...

2017-01-26 Thread ala
Github user ala closed the pull request at: https://github.com/apache/spark/pull/16713 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #16829: [SPARK-19447] Fixing input metrics for range oper...

2017-02-07 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/16829 [SPARK-19447] Fixing input metrics for range operator. ## What changes were proposed in this pull request? This change introduces a new metric "number of generated rows".

[GitHub] spark issue #21206: [SPARK-24133][SQL] Check for integer overflows when resi...

2018-05-01 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/21206 @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21206: [SPARK-24133][SQL] Check for integer overflows wh...

2018-05-01 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/21206 [SPARK-24133][SQL] Check for integer overflows when resizing WritableColumnVectors ## What changes were proposed in this pull request? `ColumnVector`s store string data in one big byte array

[GitHub] spark issue #21206: [SPARK-24133][SQL] Check for integer overflows when resi...

2018-05-01 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/21206 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21206: [SPARK-24133][SQL] Check for integer overflows wh...

2018-05-01 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/21206#discussion_r185296241 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1333,4 +1334,19 @@ class ColumnarBatchSuite

[GitHub] spark pull request #21206: [SPARK-24133][SQL] Check for integer overflows wh...

2018-05-01 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/21206#discussion_r185297840 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1333,4 +1334,19 @@ class ColumnarBatchSuite

[GitHub] spark issue #21206: [SPARK-24133][SQL] Check for integer overflows when resi...

2018-05-02 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/21206 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21206: [SPARK-24133][SQL] Check for integer overflows when resi...

2018-05-02 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/21206 Seems like something's broken with the R test, since it fails in the same way for other PRs too, e.g., https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/

[GitHub] spark pull request #21206: [SPARK-24133][SQL] Check for integer overflows wh...

2018-05-02 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/21206#discussion_r185493309 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java --- @@ -92,17 +92,22 @@ public void reserve(int

[GitHub] spark pull request #21206: [SPARK-24133][SQL] Check for integer overflows wh...

2018-05-02 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/21206#discussion_r185548756 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java --- @@ -92,17 +92,22 @@ public void reserve(int

[GitHub] spark pull request #21227: [SPARK-24133][SQL] Check for integer overflows wh...

2018-05-03 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/21227 [SPARK-24133][SQL] Check for integer overflows when resizing WritableColumnVectors `ColumnVector`s store string data in one big byte array. Since the array size is capped at just under

[GitHub] spark issue #21227: [SPARK-24133][SQL] Check for integer overflows when resi...

2018-05-03 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/21227 @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21227: Backport [SPARK-24133][SQL] Check for integer ove...

2018-05-03 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/21227#discussion_r185787003 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java --- @@ -81,7 +81,9 @@ public void close

[GitHub] spark pull request #21227: Backport [SPARK-24133][SQL] Check for integer ove...

2018-05-03 Thread ala
Github user ala closed the pull request at: https://github.com/apache/spark/pull/21227 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-16 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r181764082 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -152,39 +154,54 @@ class DataFrameRangeSuite extends QueryTest with

[GitHub] spark pull request #20664: [SPARK-23496][CORE] Locality of coalesced partiti...

2018-02-23 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/20664 [SPARK-23496][CORE] Locality of coalesced partitions can be severely skewed by the order of input partitions ## What changes were proposed in this pull request? The algorithm in

[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-02-23 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/20664 @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20664: [SPARK-23496][CORE] Locality of coalesced partiti...

2018-02-23 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/20664#discussion_r170277224 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -1129,6 +1129,36 @@ class RDDSuite extends SparkFunSuite with SharedSparkContext

[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-02-23 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/20664 Thanks for the comments. I don't think the users should be impacted by changing execution time. If the parameters of the job are constant, then the partition allocation should al

[GitHub] spark pull request #20664: [SPARK-23496][CORE] Locality of coalesced partiti...

2018-02-26 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/20664#discussion_r170611946 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -1129,6 +1129,36 @@ class RDDSuite extends SparkFunSuite with SharedSparkContext

[GitHub] spark pull request #20664: [SPARK-23496][CORE] Locality of coalesced partiti...

2018-02-26 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/20664#discussion_r170612006 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -1129,6 +1129,36 @@ class RDDSuite extends SparkFunSuite with SharedSparkContext

[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-02-27 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/20664 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-02-28 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/20664 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-03-01 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/20664 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-03-02 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/20664 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-03-05 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/20664 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19308: [SPARK-22092] Reallocation in OffHeapColumnVector...

2017-09-21 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/19308 [SPARK-22092] Reallocation in OffHeapColumnVector.reserveInternal corrupts array data ## What changes were proposed in this pull request? `OffHeapColumnVector.reserveInternal()` will only

[GitHub] spark issue #19308: [SPARK-22092] Reallocation in OffHeapColumnVector.reserv...

2017-09-21 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/19308 @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19308: [SPARK-22092] Reallocation in OffHeapColumnVector...

2017-09-22 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/19308#discussion_r140459738 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java --- @@ -517,6 +517,7 @@ public void loadBytes

[GitHub] spark issue #19308: [SPARK-22092] Reallocation in OffHeapColumnVector.reserv...

2017-09-22 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/19308 @hvanhovell How about this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19323: [SPARK-22092] Reallocation in OffHeapColumnVector...

2017-09-22 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/19323 [SPARK-22092] Reallocation in OffHeapColumnVector.reserveInternal corrupts struct and array data `OffHeapColumnVector.reserveInternal()` will only copy already inserted values during reallocation if

[GitHub] spark issue #19323: [SPARK-22092] Reallocation in OffHeapColumnVector.reserv...

2017-09-22 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/19323 @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19323: [SPARK-22092] Reallocation in OffHeapColumnVector...

2017-09-25 Thread ala
Github user ala closed the pull request at: https://github.com/apache/spark/pull/19323 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19323: [SPARK-22092] Reallocation in OffHeapColumnVector.reserv...

2017-09-25 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/19323 @hvanhovell Sure. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19367: [SPARK-22143][SQL] Fix memory leak in OffHeapColu...

2017-09-27 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/19367#discussion_r141371354 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -422,17 +420,14 @@ class ColumnarBatchSuite

[GitHub] spark pull request #19367: [SPARK-22143][SQL] Fix memory leak in OffHeapColu...

2017-09-27 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/19367#discussion_r141372248 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -829,13 +823,15 @@ class ColumnarBatchSuite

[GitHub] spark pull request #19367: [SPARK-22143][SQL] Fix memory leak in OffHeapColu...

2017-09-27 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/19367#discussion_r141370506 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -38,31 +38,45 @@ import

[GitHub] spark pull request #19367: [SPARK-22143][SQL] Fix memory leak in OffHeapColu...

2017-09-27 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/19367#discussion_r141369722 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala --- @@ -25,19 +25,25 @@ import org.apache.spark.sql.types

[GitHub] spark pull request #19473: [SPARK-22251] Metric 'aggregate time' is incorrec...

2017-10-11 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/19473 [SPARK-22251] Metric 'aggregate time' is incorrect when codegen is off ## What changes were proposed in this pull request? Adding the code for setting 'aggregate time' m

[GitHub] spark issue #19473: [SPARK-22251] Metric 'aggregate time' is incorrect when ...

2017-10-11 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/19473 @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19473: [SPARK-22251][SQL] Metric 'aggregate time' is incorrect ...

2017-10-12 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/19473 @maropu I added aggregate time to `ObjectHashAggregateExec`. `SortAggregateExec` is a different case. First, because it processes input gradually, so it's hard to precisely measure the time. And s

[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-10-16 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/19479 cc @bogdanrdc --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #16960: [SPARK-19447] Make Range operator generate "recordsRead"...

2017-05-10 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/16960 True. There's a couple of lines that should be removed with this change, that were left behind. numGeneratedRows should be gone. --- If your project is set up for it, you can reply to this emai

[GitHub] spark pull request #17939: [SPARK-19447] Remove remaining references to gene...

2017-05-10 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/17939 [SPARK-19447] Remove remaining references to generated rows metric ## What changes were proposed in this pull request? https://github.com/apache/spark/commit

[GitHub] spark issue #17939: [SPARK-19447] Remove remaining references to generated r...

2017-05-10 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/17939 @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #16960: [SPARK-19447] Make Range operator generate "recordsRead"...

2017-05-11 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/16960 Thanks @jaceklaskowski - it's already done: https://github.com/apache/spark/pull/17939 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #18030: [SPARK-20798] GenerateUnsafeProjection should check if a...

2017-05-18 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/18030 @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #18030: [SPARK-20798] GenerateUnsafeProjection should che...

2017-05-18 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/18030 [SPARK-20798] GenerateUnsafeProjection should check if a value is null before calling the getter ## What changes were proposed in this pull request

[GitHub] spark pull request #18030: [SPARK-20798] GenerateUnsafeProjection should che...

2017-05-19 Thread ala
Github user ala commented on a diff in the pull request: https://github.com/apache/spark/pull/18030#discussion_r117433160 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala --- @@ -50,10 +50,15 @@ object