[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20633 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1308/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20633 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20633 **[Test build #87996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87996/testReport)** for PR 20633 at commit [`166cdbb`](https://github.com/apache/spark/commit/166cdbb3e95315e0feb29fb26c6c98837747e22d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20745 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87992/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20745 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20745 **[Test build #87992 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87992/testReport)** for PR 20745 at commit [`55aa8bc`](https://github.com/apache/spark/commit/55aa8bca96b112a33cabb352afb4168c2d8f355c). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CatalogColumnStat(` * `case class LocalRelation(` * `case class StreamingDataSourceV2Relation(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20433 You meant the HIVE jira? If so, no (I was going to check now). Any point I should know? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20659 Yes, I'm doing it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20433 ok, I'll update based on the comments soon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20433 Could you create `interval.sql` by adding the test cases in https://issues.apache.org/jira/browse/HIVE-13557 ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Support interval values withou...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r172427740 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -790,6 +796,16 @@ ASC: 'ASC'; DESC: 'DESC'; FOR: 'FOR'; INTERVAL: 'INTERVAL'; +YEAR: 'YEAR' | 'YEARS'; --- End diff -- Also update `TableIdentifierParserSuite` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Support interval values withou...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r172427617 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -790,6 +796,16 @@ ASC: 'ASC'; DESC: 'DESC'; FOR: 'FOR'; INTERVAL: 'INTERVAL'; +YEAR: 'YEAR' | 'YEARS'; +MONTH: 'MONTH' | 'MONTHS'; +WEEK: 'WEEK' | 'WEEKS'; +DAY: 'DAY' | 'DAYS'; +HOUR: 'HOUR' | 'HOURS'; +MINUTE: 'MINUTE' | 'MINUTES'; +SECOND: 'SECOND' | 'SECONDS'; +MILLISECOND: 'MILLISECOND' | 'MILLISECONDS'; +MICROSECOND: 'MICROSECOND' | 'MICROSECONDS'; +NANOSECOND: 'NANOSECOND' | 'NANOSECONDS'; --- End diff -- We do not support `nanosecond`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Support interval values withou...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r172427354 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -790,6 +796,16 @@ ASC: 'ASC'; DESC: 'DESC'; FOR: 'FOR'; INTERVAL: 'INTERVAL'; +YEAR: 'YEAR' | 'YEARS'; +MONTH: 'MONTH' | 'MONTHS'; +WEEK: 'WEEK' | 'WEEKS'; +DAY: 'DAY' | 'DAYS'; +HOUR: 'HOUR' | 'HOURS'; +MINUTE: 'MINUTE' | 'MINUTES'; +SECOND: 'SECOND' | 'SECONDS'; +MILLISECOND: 'MILLISECOND' | 'MILLISECONDS'; --- End diff -- yea. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Support interval values withou...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r172426790 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -790,6 +796,16 @@ ASC: 'ASC'; DESC: 'DESC'; FOR: 'FOR'; INTERVAL: 'INTERVAL'; +YEAR: 'YEAR' | 'YEARS'; +MONTH: 'MONTH' | 'MONTHS'; +WEEK: 'WEEK' | 'WEEKS'; +DAY: 'DAY' | 'DAYS'; +HOUR: 'HOUR' | 'HOURS'; +MINUTE: 'MINUTE' | 'MINUTES'; +SECOND: 'SECOND' | 'SECONDS'; +MILLISECOND: 'MILLISECOND' | 'MILLISECONDS'; --- End diff -- nvm, it sounds like we already support them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Support interval values withou...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r172426643 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -790,6 +796,16 @@ ASC: 'ASC'; DESC: 'DESC'; FOR: 'FOR'; INTERVAL: 'INTERVAL'; +YEAR: 'YEAR' | 'YEARS'; +MONTH: 'MONTH' | 'MONTHS'; +WEEK: 'WEEK' | 'WEEKS'; +DAY: 'DAY' | 'DAYS'; +HOUR: 'HOUR' | 'HOURS'; +MINUTE: 'MINUTE' | 'MINUTES'; +SECOND: 'SECOND' | 'SECONDS'; +MILLISECOND: 'MILLISECOND' | 'MILLISECONDS'; --- End diff -- I am wondering which systems support `MILLISECOND `, `MICROSECOND ` and `NANOSECOND `? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20746: [SPARK-23594][SQL] GetExternalRowField should support in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20746 **[Test build #87995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87995/testReport)** for PR 20746 at commit [`62a9814`](https://github.com/apache/spark/commit/62a98147a7a9aeb43e4827e3095577d0be6dee47). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20746: [SPARK-23594][SQL] GetExternalRowField should support in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20746 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1307/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20746: [SPARK-23594][SQL] GetExternalRowField should support in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20746 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20746: [SPARK-23594][SQL] GetExternalRowField should sup...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/20746 [SPARK-23594][SQL] GetExternalRowField should support interpreted execution ## What changes were proposed in this pull request? This pr added interpreted execution for `GetExternalRowField`. ## How was this patch tested? Added tests in `ObjectExpressionsSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-23594 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20746.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20746 commit 62a98147a7a9aeb43e4827e3095577d0be6dee47 Author: Takeshi YamamuroDate: 2018-03-06T07:04:30Z GetExternalRowField should support interpreted execution --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20699: [SPARK-23544][SQL]Remove redundancy ShuffleExchange in t...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/20699 `EnsureRequirements `can eliminates unnecessary shuffles if child has same partitioning or compatible child partitionings that same expressions distribution. but when child has different partitioning or different expressions distribution. `EnsureRequirements `can't eliminates unnecessary shuffles . this PR deals with the latter case. thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #87994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87994/testReport)** for PR 19222 at commit [`a62770b`](https://github.com/apache/spark/commit/a62770bdcd2cd83dc19d5f39a55b5186201ddc34). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user animenon commented on the issue: https://github.com/apache/spark/pull/20716 @HyukjinKwon Its minor, so may not be required. Had tagged Gator just for a check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1306/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
GitHub user kiszk reopened a pull request: https://github.com/apache/spark/pull/19222 [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks to choose several types of memory block ## What changes were proposed in this pull request? This PR allows us to use one of several types of `MemoryBlock`, such as byte array, int array, long array, or `java.nio.DirectByteBuffer`. To use `java.nio.DirectByteBuffer` allows to have off heap memory which is automatically deallocated by JVM. `MemoryBlock` class has primitive accessors like `Platform.getInt()`, `Platform.putint()`, or `Platform.copyMemory()`. This PR uses `MemoryBlock` for `OffHeapColumnVector`, `UTF8String`, and other places. This PR can improve performance of operations involving memory accesses (e.g. `UTF8String.trim`) by 1.8x. For now, this PR does not use `MemoryBlock` for `BufferHolder` based on @cloud-fan's [suggestion](https://github.com/apache/spark/pull/11494#issuecomment-309694290). Since this PR is a successor of #11494, close #11494. Many codes were ported from #11494. Many efforts were put here. **I think this PR should credit to @yzotov.** This PR can achieve **1.1-1.4x performance improvements** for operations in `UTF8String` or `Murmur3_x86_32`. Other operations are almost comparable performances. Without this PR ``` OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 4.4.0-22-generic Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 4.4.0-22-generic Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz Hash byte arrays with length 268435487: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Murmur3_x86_32 526 / 536 0.0 131399881.5 1.0X UTF8String benchmark:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative hashCode 525 / 552 1022.6 1.0 1.0X substring 414 / 423 1298.0 0.8 1.3X ``` With this PR ``` OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 4.4.0-22-generic Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz Hash byte arrays with length 268435487: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Murmur3_x86_32 474 / 488 0.0 118552232.0 1.0X UTF8String benchmark:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative hashCode 476 / 480 1127.3 0.9 1.0X substring 287 / 291 1869.9 0.5 1.7X ``` Benchmark program ``` test("benchmark Murmur3_x86_32") { val length = 8192 * 32768 + 31 val seed = 42L val iters = 1 << 2 val random = new Random(seed) val arrays = Array.fill[MemoryBlock](numArrays) { val bytes = new Array[Byte](length) random.nextBytes(bytes) new ByteArrayMemoryBlock(bytes, Platform.BYTE_ARRAY_OFFSET, length) } val benchmark = new Benchmark("Hash byte arrays with length " + length, iters * numArrays, minNumIters = 20) benchmark.addCase("HiveHasher") { _: Int => var sum = 0L for (_ <- 0L until iters) { sum += HiveHasher.hashUnsafeBytesBlock( arrays(i), Platform.BYTE_ARRAY_OFFSET, length) } } benchmark.run() } test("benchmark UTF8String") { val N = 512 * 1024 * 1024 val iters = 2 val benchmark = new Benchmark("UTF8String benchmark", N, minNumIters = 20) val str0 = new java.io.StringWriter() { { for (i <- 0 until N) { write(" ") } } }.toString val s0 = UTF8String.fromString(str0) benchmark.addCase("hashCode") { _: Int => var h: Int = 0 for (_ <- 0L until iters) { h += s0.hashCode } } benchmark.addCase("substring") { _: Int => var s: UTF8String = null for (_ <- 0L until iters) { s = s0.substring(N / 2 - 5, N / 2 + 5) } } benchmark.run() } ``` I run [this benchmark program](https://gist.github.com/kiszk/94f75b506c93a663bbbc372ffe8f05de) using [the
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk closed the pull request at: https://github.com/apache/spark/pull/19222 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19222 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172421399 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java --- @@ -57,20 +59,20 @@ // The data stored in these two allocations need to maintain binary compatible. We can // directly pass this buffer to external components. - private long nulls; --- End diff -- I see. `Platform.reallocateMemory` does not exist. `MemoryAllocator.UNSAFE.reallocate()` returns `MemoryBlock`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r172420910 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -351,17 +359,21 @@ private[ml] object DefaultParamsReader { timestamp: Long, sparkVersion: String, params: JValue, + defaultParams: JValue, metadata: JValue, metadataJson: String) { /** * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name. * This can be useful for getting a Param value before an instance of `Params` * is available. + * + * @param isDefaultParam Whether the given param name is a default param. Default is false. */ -def getParamValue(paramName: String): JValue = { +def getParamValue(paramName: String, isDefaultParam: Boolean = false): JValue = { --- End diff -- Sounds good. I will change this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...
Github user lucio-yz commented on the issue: https://github.com/apache/spark/pull/20472 @srowen Any other problems? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87989/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #87989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87989/testReport)** for PR 19222 at commit [`a62770b`](https://github.com/apache/spark/commit/a62770bdcd2cd83dc19d5f39a55b5186201ddc34). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20633 @WeichenXu123 I've added unit test in `DefaultReadWriteSuite/DefaultReadWriteTest` to test if this can read old metadata back. Sounds like the backward compatibility test you suggested should be checked manually. I will test it. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/20449#discussion_r172416019 --- Diff: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala --- @@ -104,9 +104,16 @@ private[spark] class BlockStoreShuffleReader[K, C]( context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled) context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled) context.taskMetrics().incPeakExecutionMemory(sorter.peakMemoryUsedBytes) +// Use completion callback to stop sorter if task was finished/cancelled. +context.addTaskCompletionListener(_ => { + sorter.stop() +}) CompletionIterator[Product2[K, C], Iterator[Product2[K, C]]](sorter.iterator, sorter.stop()) case None => aggregatedIter } +// Use another interruptible iterator here to support task cancellation as aggregator or(and) +// sorter may have consumed previous interruptible iterator. +new InterruptibleIterator[Product2[K, C]](context, resultIter) --- End diff -- Will do --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20345 ping --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r172415192 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/NormalizerSuite.scala --- @@ -17,94 +17,72 @@ package org.apache.spark.ml.feature -import org.apache.spark.SparkFunSuite import org.apache.spark.ml.linalg.{DenseVector, SparseVector, Vector, Vectors} -import org.apache.spark.ml.util.DefaultReadWriteTest +import org.apache.spark.ml.util.{DefaultReadWriteTest, MLTest} import org.apache.spark.ml.util.TestingUtils._ -import org.apache.spark.mllib.util.MLlibTestSparkContext import org.apache.spark.sql.{DataFrame, Row} -class NormalizerSuite extends SparkFunSuite with MLlibTestSparkContext with DefaultReadWriteTest { +class NormalizerSuite extends MLTest with DefaultReadWriteTest { import testImplicits._ - @transient var data: Array[Vector] = _ - @transient var dataFrame: DataFrame = _ - @transient var normalizer: Normalizer = _ - @transient var l1Normalized: Array[Vector] = _ - @transient var l2Normalized: Array[Vector] = _ + @transient val data: Seq[Vector] = Seq( +Vectors.sparse(3, Seq((0, -2.0), (1, 2.3))), +Vectors.dense(0.0, 0.0, 0.0), +Vectors.dense(0.6, -1.1, -3.0), +Vectors.sparse(3, Seq((1, 0.91), (2, 3.2))), +Vectors.sparse(3, Seq((0, 5.7), (1, 0.72), (2, 2.7))), +Vectors.sparse(3, Seq())) --- End diff -- ok its a minor issue lets ignore it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20433 ping --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20449#discussion_r172414393 --- Diff: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala --- @@ -104,9 +104,16 @@ private[spark] class BlockStoreShuffleReader[K, C]( context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled) context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled) context.taskMetrics().incPeakExecutionMemory(sorter.peakMemoryUsedBytes) +// Use completion callback to stop sorter if task was finished/cancelled. +context.addTaskCompletionListener(_ => { + sorter.stop() +}) CompletionIterator[Product2[K, C], Iterator[Product2[K, C]]](sorter.iterator, sorter.stop()) case None => aggregatedIter } +// Use another interruptible iterator here to support task cancellation as aggregator or(and) +// sorter may have consumed previous interruptible iterator. +new InterruptibleIterator[Product2[K, C]](context, resultIter) --- End diff -- there is a chance that `resultIter` is already an `InterruptibleIterator`, and we should not double wrap it. Can you send a followup PR to fix this? then we can backport them to 2.3 together. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20742 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18610: [SPARK-21386] ML LinearRegression supports warm start fr...
Github user JohnHBrock commented on the issue: https://github.com/apache/spark/pull/18610 What else needs to be done before this can be merged? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20464 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20742 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87987/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20464 **[Test build #87993 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87993/testReport)** for PR 20464 at commit [`0ebdf74`](https://github.com/apache/spark/commit/0ebdf74942e0894bfaf6cbede4c03fd3f5d26411). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20464 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87993/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20742 **[Test build #87987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87987/testReport)** for PR 20742 at commit [`c867373`](https://github.com/apache/spark/commit/c867373867b88cce4eed8a69bdf05585f7142dc1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172413460 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java --- @@ -57,20 +59,20 @@ // The data stored in these two allocations need to maintain binary compatible. We can // directly pass this buffer to external components. - private long nulls; --- End diff -- To remove `Platform.reallocateMemory` is not a strong reason to migrate `OffHeapColumnVectot` to memory block, we can do it later, and update `OnHeapColumnVector` too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19381 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87991/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172412911 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java --- @@ -57,20 +59,20 @@ // The data stored in these two allocations need to maintain binary compatible. We can // directly pass this buffer to external components. - private long nulls; --- End diff -- Ah, you want to allocate memory for `OffHeapColumnVector` by using `Platform` instead of `MemoryBlock`? Could you please explain why `OffHeapColumnVector` wants to allocate memory `Platform`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19381 **[Test build #87991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87991/testReport)** for PR 19381 at commit [`1420867`](https://github.com/apache/spark/commit/1420867e43e32f46e18dccf61720228a5b8f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20659 Nice try! Could you fix the remaining failure? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/20449 @cloud-fan is it possible that we also merge this into branch-2.3, so this fix could be released in the Spark-2.3.1? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20464 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1305/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20464 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172410077 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java --- @@ -57,20 +59,20 @@ // The data stored in these two allocations need to maintain binary compatible. We can // directly pass this buffer to external components. - private long nulls; --- End diff -- what if we don't remove `Platform.reallocateMemory`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20464 **[Test build #87993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87993/testReport)** for PR 20464 at commit [`0ebdf74`](https://github.com/apache/spark/commit/0ebdf74942e0894bfaf6cbede4c03fd3f5d26411). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20464: [SPARK-23291][SQL][R] R's substr should not reduc...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20464#discussion_r172409837 --- Diff: R/pkg/R/column.R --- @@ -169,7 +169,7 @@ setMethod("alias", #' @note substr since 1.4.0 setMethod("substr", signature(x = "Column"), function(x, start, stop) { -jc <- callJMethod(x@jc, "substr", as.integer(start - 1), as.integer(stop - start + 1)) +jc <- callJMethod(x@jc, "substr", as.integer(start), as.integer(stop - start + 1)) --- End diff -- Added to the func doc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20742: [SPARK-23572][docs] Bring "security.md" up to dat...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20742#discussion_r172409490 --- Diff: R/pkg/DESCRIPTION --- @@ -57,6 +57,6 @@ Collate: 'types.R' 'utils.R' 'window.R' -RoxygenNote: 5.0.1 +RoxygenNote: 6.0.1 --- End diff -- pls revert this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20699: [SPARK-23544][SQL]Remove redundancy ShuffleExchange in t...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20699 Sorry I should make the question more specific: `EnsureRequirement#apply` has a hack to eliminate unnecessary shuffles, do we still need that? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r172408255 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala --- @@ -313,13 +306,14 @@ class RFormulaSuite extends MLTest with DefaultReadWriteTest { Seq(("male", "foo", 4), ("female", "bar", 4), ("female", "bar", 5), ("male", "baz", 5)) .toDF("id", "a", "b") val model = formula.fit(original) +val attr = NominalAttribute.defaultAttr val expected = Seq( ("male", "foo", 4, Vectors.dense(0.0, 1.0, 4.0), 1.0), ("female", "bar", 4, Vectors.dense(1.0, 0.0, 4.0), 0.0), ("female", "bar", 5, Vectors.dense(1.0, 0.0, 5.0), 0.0), ("male", "baz", 5, Vectors.dense(0.0, 0.0, 5.0), 1.0) ).toDF("id", "a", "b", "features", "label") -// assert(result.schema.toString == resultSchema.toString) + .select($"id", $"a", $"b", $"features", $"label".as("label", attr.toMetadata())) --- End diff -- I am also confused about the align rule. @jkbradley what do you think ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r172408009 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala --- @@ -324,19 +352,46 @@ class QuantileDiscretizerSuite .setStages(Array(discretizerForCol1, discretizerForCol2, discretizerForCol3)) .fit(df) -val resultForMultiCols = plForMultiCols.transform(df) - .select("result1", "result2", "result3") - .collect() - -val resultForSingleCol = plForSingleCol.transform(df) - .select("result1", "result2", "result3") - .collect() +val expected = Seq( + (0.0, 0.0, 0.0), + (0.0, 0.0, 1.0), + (0.0, 0.0, 1.0), + (0.0, 1.0, 2.0), + (0.0, 1.0, 2.0), + (0.0, 1.0, 2.0), + (0.0, 1.0, 3.0), + (0.0, 2.0, 4.0), + (0.0, 2.0, 4.0), + (1.0, 2.0, 5.0), + (1.0, 2.0, 5.0), + (1.0, 2.0, 5.0), + (1.0, 3.0, 6.0), + (1.0, 3.0, 6.0), + (1.0, 3.0, 7.0), + (1.0, 4.0, 8.0), + (1.0, 4.0, 8.0), + (1.0, 4.0, 9.0), + (1.0, 4.0, 9.0), + (1.0, 4.0, 9.0) + ).toDF("result1", "result2", "result3") +.collect().toSeq --- End diff -- But I prefer to avoid hardcoding big literal array so that the code is easier for maintenance. and following code is enough I think: ``` val expected = plForSingleCol.transform(df).select("result1", "result2", "result3").collect() testTransformerByGlobalCheckFunc[(Double, Double, Double)]( df,plForSingleCol, "result1", "result2","result3") { rows =>assert(rows == expected) } ``` There is a similar case here https://github.com/apache/spark/pull/20121#discussion_r172288890 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20647 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20295 @icexelloss Could you annotate `[SQL][PYTHON]` in the pr title please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20647 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16006 @omuravskiy can you comment on https://github.com/apache/spark/pull/19431 since it appears to be based on your PR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20745 **[Test build #87992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87992/testReport)** for PR 20745 at commit [`55aa8bc`](https://github.com/apache/spark/commit/55aa8bca96b112a33cabb352afb4168c2d8f355c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20464: [SPARK-23291][SQL][R] R's substr should not reduc...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20464#discussion_r172406939 --- Diff: R/pkg/R/column.R --- @@ -169,7 +169,7 @@ setMethod("alias", #' @note substr since 1.4.0 setMethod("substr", signature(x = "Column"), function(x, start, stop) { -jc <- callJMethod(x@jc, "substr", as.integer(start - 1), as.integer(stop - start + 1)) +jc <- callJMethod(x@jc, "substr", as.integer(start), as.integer(stop - start + 1)) --- End diff -- I think you mean 1-based, --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20745 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20745: [SPARK-23288][SS] Fix output metrics with parquet...
GitHub user gaborgsomogyi opened a pull request: https://github.com/apache/spark/pull/20745 [SPARK-23288][SS] Fix output metrics with parquet sink ## What changes were proposed in this pull request? Output metrics were not filled when parquet sink used. This PR fixes this problem by passing a `BasicWriteJobStatsTracker` in `FileStreamSink`. ## How was this patch tested? Additional unit test added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gaborgsomogyi/spark SPARK-23288 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20745.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20745 commit 22e6ca1576bdeee2092afc8bc82a743e0700a959 Author: Gabor SomogyiDate: 2018-02-19T23:43:46Z [SPARK-23288][SS] Fix output metrics with parquet sink commit 55aa8bca96b112a33cabb352afb4168c2d8f355c Author: Gabor Somogyi Date: 2018-02-28T22:50:47Z Merge branch 'master' into SPARK-23288 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20639: [SPARK-23288][SS] Fix output metrics with parquet sink
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20639 God, seems like stuck somehow. I'll re-create the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20639: [SPARK-23288][SS] Fix output metrics with parquet...
Github user gaborgsomogyi closed the pull request at: https://github.com/apache/spark/pull/20639 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19381 **[Test build #87991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87991/testReport)** for PR 19381 at commit [`1420867`](https://github.com/apache/spark/commit/1420867e43e32f46e18dccf61720228a5b8f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19381 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1304/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20295 LGTM except for @BryanCutler's suggestion (https://github.com/apache/spark/pull/20295#discussion_r172374978). Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20743: [SPARK-23020][CORE][branch-2.3] Fix another race in the ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20743 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20743: [SPARK-23020][CORE][branch-2.3] Fix another race in the ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20743 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87985/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20743: [SPARK-23020][CORE][branch-2.3] Fix another race in the ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20743 **[Test build #87985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87985/testReport)** for PR 20743 at commit [`06aa292`](https://github.com/apache/spark/commit/06aa292c15e61170b91f622dbce54a8149c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1303/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19381 **[Test build #87990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87990/testReport)** for PR 19381 at commit [`ab68214`](https://github.com/apache/spark/commit/ab68214028979b431de4fe605a843e4a0cb013db). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19381 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20706: [SPARK-23550][core] Cleanup `Utils`.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20706 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87984/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20706: [SPARK-23550][core] Cleanup `Utils`.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20706 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16006 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20706: [SPARK-23550][core] Cleanup `Utils`.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20706 **[Test build #87984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87984/testReport)** for PR 20706 at commit [`427a977`](https://github.com/apache/spark/commit/427a977b33c6c3f2e436b43ac9f9263c64f835bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20726: [SPARK-23574][CORE] Report SinglePartition in DataSource...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20726 Btw, I think the title should be `[SQL]` instead of `[CORE]`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20726: [SPARK-23574][CORE] Report SinglePartition in DataSource...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20726 LGTM with one trivial doc point. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20726: [SPARK-23574][CORE] Report SinglePartition in Dat...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20726#discussion_r172403479 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsReportPartitioning.java --- @@ -23,6 +23,10 @@ /** * A mix in interface for {@link DataSourceReader}. Data source readers can implement this * interface to report data partitioning and try to avoid shuffle at Spark side. + * + * Note that Spark will always infer a + * {@link org.apache.spark.sql.catalyst.plans.physical.SinglePartition} partitioning when the + * reader creates exactly 1 {@link DataReaderFactory}. --- End diff -- nit: no matter the reader implements this interface or not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20649: [SPARK-23462][SQL] improve missing field error message i...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20649 I usually leave open it for few more days in case other reviewers have some more review comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20702: [SPARK-23547][SQL]Cleanup the .pipeout file when the Hiv...
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/20702 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20657 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87983/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20657 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20657 **[Test build #87983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87983/testReport)** for PR 20657 at commit [`3294596`](https://github.com/apache/spark/commit/329459652fb40eb82b81ef66ad93cec05b9dd016). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #87989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87989/testReport)** for PR 19222 at commit [`a62770b`](https://github.com/apache/spark/commit/a62770bdcd2cd83dc19d5f39a55b5186201ddc34). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1302/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172395871 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +/** + * A consecutive block of memory with a long array on Java heap. + */ +public final class OnHeapMemoryBlock extends MemoryBlock { --- End diff -- ð --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16770: [SPARK-15009][PYTHON][ML] Construct CountVectorizerModel...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16770 **[Test build #87988 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87988/testReport)** for PR 16770 at commit [`8860641`](https://github.com/apache/spark/commit/8860641487411d23cd86e932f0c50d06ecee626c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16770: [SPARK-15009][PYTHON][ML] Construct CountVectorizerModel...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16770 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87988/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org