Github user ala closed the pull request at:
https://github.com/apache/spark/pull/21227
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/21227#discussion_r185787003
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
---
@@ -81,7 +81,9 @@ public void close
Github user ala commented on the issue:
https://github.com/apache/spark/pull/21227
@gatorsmile
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/21227
[SPARK-24133][SQL] Check for integer overflows when resizing
WritableColumnVectors
`ColumnVector`s store string data in one big byte array. Since the array
size is capped at just under
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/21206#discussion_r185548756
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
---
@@ -92,17 +92,22 @@ public void reserve(int
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/21206#discussion_r185493309
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
---
@@ -92,17 +92,22 @@ public void reserve(int
Github user ala commented on the issue:
https://github.com/apache/spark/pull/21206
Seems like something's broken with the R test, since it fails in the same
way for other PRs too, e.g.,
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/
Github user ala commented on the issue:
https://github.com/apache/spark/pull/21206
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/21206#discussion_r185297840
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
---
@@ -1333,4 +1334,19 @@ class ColumnarBatchSuite
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/21206#discussion_r185296241
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
---
@@ -1333,4 +1334,19 @@ class ColumnarBatchSuite
Github user ala commented on the issue:
https://github.com/apache/spark/pull/21206
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/21206
[SPARK-24133][SQL] Check for integer overflows when resizing
WritableColumnVectors
## What changes were proposed in this pull request?
`ColumnVector`s store string data in one big byte array
Github user ala commented on the issue:
https://github.com/apache/spark/pull/21206
@hvanhovell
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/20888#discussion_r181764082
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala ---
@@ -152,39 +154,54 @@ class DataFrameRangeSuite extends QueryTest with
Github user ala commented on the issue:
https://github.com/apache/spark/pull/20664
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user ala commented on the issue:
https://github.com/apache/spark/pull/20664
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user ala commented on the issue:
https://github.com/apache/spark/pull/20664
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user ala commented on the issue:
https://github.com/apache/spark/pull/20664
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user ala commented on the issue:
https://github.com/apache/spark/pull/20664
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/20664#discussion_r170612006
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -1129,6 +1129,36 @@ class RDDSuite extends SparkFunSuite with
SharedSparkContext
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/20664#discussion_r170611946
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -1129,6 +1129,36 @@ class RDDSuite extends SparkFunSuite with
SharedSparkContext
Github user ala commented on the issue:
https://github.com/apache/spark/pull/20664
Thanks for the comments.
I don't think the users should be impacted by changing execution time. If
the parameters of the job are constant, then the partition allocation should
al
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/20664#discussion_r170277224
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -1129,6 +1129,36 @@ class RDDSuite extends SparkFunSuite with
SharedSparkContext
Github user ala commented on the issue:
https://github.com/apache/spark/pull/20664
@hvanhovell
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/20664
[SPARK-23496][CORE] Locality of coalesced partitions can be severely skewed
by the order of input partitions
## What changes were proposed in this pull request?
The algorithm in
Github user ala commented on the issue:
https://github.com/apache/spark/pull/19479
cc @bogdanrdc
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user ala commented on the issue:
https://github.com/apache/spark/pull/19473
@maropu I added aggregate time to `ObjectHashAggregateExec`.
`SortAggregateExec` is a different case. First, because it processes input
gradually, so it's hard to precisely measure the time. And s
Github user ala commented on the issue:
https://github.com/apache/spark/pull/19473
@hvanhovell
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/19473
[SPARK-22251] Metric 'aggregate time' is incorrect when codegen is off
## What changes were proposed in this pull request?
Adding the code for setting 'aggregate time' m
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/19367#discussion_r141369722
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala
---
@@ -25,19 +25,25 @@ import org.apache.spark.sql.types
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/19367#discussion_r141370506
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
---
@@ -38,31 +38,45 @@ import
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/19367#discussion_r141372248
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
---
@@ -829,13 +823,15 @@ class ColumnarBatchSuite
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/19367#discussion_r141371354
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
---
@@ -422,17 +420,14 @@ class ColumnarBatchSuite
Github user ala commented on the issue:
https://github.com/apache/spark/pull/19323
@hvanhovell Sure. Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user ala closed the pull request at:
https://github.com/apache/spark/pull/19323
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user ala commented on the issue:
https://github.com/apache/spark/pull/19323
@hvanhovell
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/19323
[SPARK-22092] Reallocation in OffHeapColumnVector.reserveInternal corrupts
struct and array data
`OffHeapColumnVector.reserveInternal()` will only copy already inserted
values during reallocation if
Github user ala commented on the issue:
https://github.com/apache/spark/pull/19308
@hvanhovell How about this?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/19308#discussion_r140459738
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
---
@@ -517,6 +517,7 @@ public void loadBytes
Github user ala commented on the issue:
https://github.com/apache/spark/pull/19308
@hvanhovell
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/19308
[SPARK-22092] Reallocation in OffHeapColumnVector.reserveInternal corrupts
array data
## What changes were proposed in this pull request?
`OffHeapColumnVector.reserveInternal()` will only
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/18030#discussion_r117433160
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
---
@@ -50,10 +50,15 @@ object
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/18030
[SPARK-20798] GenerateUnsafeProjection should check if a value is null
before calling the getter
## What changes were proposed in this pull request
Github user ala commented on the issue:
https://github.com/apache/spark/pull/18030
@hvanhovell
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user ala commented on the issue:
https://github.com/apache/spark/pull/16960
Thanks @jaceklaskowski - it's already done:
https://github.com/apache/spark/pull/17939
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user ala commented on the issue:
https://github.com/apache/spark/pull/17939
@hvanhovell
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/17939
[SPARK-19447] Remove remaining references to generated rows metric
## What changes were proposed in this pull request?
https://github.com/apache/spark/commit
Github user ala commented on the issue:
https://github.com/apache/spark/pull/16960
True. There's a couple of lines that should be removed with this change,
that were left behind. numGeneratedRows should be gone.
---
If your project is set up for it, you can reply to this emai
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/16960
[SPARK-19447] Make Range operator generate "recordsRead" metric
## What changes were proposed in this pull request?
The Range was modified to produce "recordsRead&quo
Github user ala commented on the issue:
https://github.com/apache/spark/pull/16940
@rxin
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/16940
[SPARK-19607] Finding QueryExecution that matches provided executionId
## What changes were proposed in this pull request?
Implementing a mapping between executionId and corresponding
Github user ala commented on the issue:
https://github.com/apache/spark/pull/16914
@rxin
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/16914
[SPARK-19514] Enhancing the test for Range interruption.
Improve the test for SPARK-19514, so that it's clear which stage is being
cancelled.
You can merge this pull request into a Git reposito
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/16872#discussion_r100825479
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala ---
@@ -127,4 +133,28 @@ class DataFrameRangeSuite extends QueryTest with
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/16872#discussion_r100764417
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala ---
@@ -127,4 +133,28 @@ class DataFrameRangeSuite extends QueryTest with
Github user ala commented on the issue:
https://github.com/apache/spark/pull/16887
@rxin
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/16887
[SPARK-19549] Allow providing reason for stage/job cancelling
## What changes were proposed in this pull request?
This change add an optional argument to `SparkContext.cancelStage()` and
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/16872#discussion_r100347811
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala ---
@@ -127,4 +133,28 @@ class DataFrameRangeSuite extends QueryTest with
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/16872#discussion_r100316320
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala ---
@@ -127,4 +133,28 @@ class DataFrameRangeSuite extends QueryTest with
Github user ala commented on the issue:
https://github.com/apache/spark/pull/16872
@hvanhovell @rxin
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/16872
[SPARK-19514] Making range interruptible.
## What changes were proposed in this pull request?
Previously range operator could not be interrupted. For example, using
DAGScheduler.cancelStage
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/16829
[SPARK-19447] Fixing input metrics for range operator.
## What changes were proposed in this pull request?
This change introduces a new metric "number of generated rows".
Github user ala closed the pull request at:
https://github.com/apache/spark/pull/16713
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/16713
[SC-5550] Automatic killing of tasks that are producing too many output rows
## What changes were proposed in this pull request?
This change implements TaskOutputListener, which continuously
64 matches
Mail list logo