[GitHub] spark issue #19273: Revert "[SPARK-21428] Turn IsolatedClientLoader off whil...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19273 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19130 **[Test build #81914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81914/testReport)** for PR 19130 at commit [`0fb7943`](https://github.com/apache/spark/commit/0fb79432f5483f3fb83360d634dd0a5aacade157). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...
Github user buryat commented on a diff in the pull request: https://github.com/apache/spark/pull/19266#discussion_r139616346 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -344,7 +344,7 @@ class Word2Vec extends Serializable with Logging { val newSentences = sentences.repartition(numPartitions).cache() val initRandom = new XORShiftRandom(seed) -if (vocabSize.toLong * vectorSize >= Int.MaxValue) { +if (vocabSize.toLong * vectorSize >= Int.MaxValue - 8) { --- End diff -- I think this should be just `>` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...
Github user buryat commented on a diff in the pull request: https://github.com/apache/spark/pull/19266#discussion_r139616411 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala --- @@ -304,8 +304,8 @@ class BlockMatrix @Since("1.3.0") ( s"Int.MaxValue. Currently numRows: ${numRows()}") require(numCols() < Int.MaxValue, "The number of columns of this matrix should be less than " + s"Int.MaxValue. Currently numCols: ${numCols()}") -require(numRows() * numCols() < Int.MaxValue, "The length of the values array must be " + - s"less than Int.MaxValue. Currently numRows * numCols: ${numRows() * numCols()}") +require(numRows() * numCols() < Int.MaxValue - 8, "The length of the values array must be " + --- End diff -- `<=` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...
Github user buryat commented on a diff in the pull request: https://github.com/apache/spark/pull/19266#discussion_r139616165 --- Diff: core/src/main/scala/org/apache/spark/util/collection/CompactBuffer.scala --- @@ -126,22 +126,20 @@ private[spark] class CompactBuffer[T: ClassTag] extends Seq[T] with Serializable /** Increase our size to newSize and grow the backing array if needed. */ private def growToSize(newSize: Int): Unit = { -if (newSize < 0) { - throw new UnsupportedOperationException("Can't grow buffer past Int.MaxValue elements") +val arrayMax = Int.MaxValue - 8 +if (newSize < 0 || newSize - 2 > arrayMax) { + throw new UnsupportedOperationException(s"Can't grow buffer past $arrayMax elements") } val capacity = if (otherElements != null) otherElements.length + 2 else 2 if (newSize > capacity) { - var newArrayLen = 8 + var newArrayLen = 8L while (newSize - 2 > newArrayLen) { --- End diff -- ah, I see that it's reserved, wasn't clear to me why `- 2`, seems like a magic number to me, I see it other places too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...
Github user buryat commented on a diff in the pull request: https://github.com/apache/spark/pull/19266#discussion_r139615624 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/LongArray.java --- @@ -39,7 +39,7 @@ private final long length; public LongArray(MemoryBlock memory) { -assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size > 4 billion elements"; +assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size > 2.1 billion elements"; --- End diff -- maybe also add the exact number in the error message instead of `2.1 billion` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...
Github user buryat commented on a diff in the pull request: https://github.com/apache/spark/pull/19266#discussion_r139615418 --- Diff: core/src/main/scala/org/apache/spark/util/collection/PartitionedPairBuffer.scala --- @@ -96,5 +96,5 @@ private[spark] class PartitionedPairBuffer[K, V](initialCapacity: Int = 64) } private object PartitionedPairBuffer { - val MAXIMUM_CAPACITY = Int.MaxValue / 2 // 2 ^ 30 - 1 + val MAXIMUM_CAPACITY = (Int.MaxValue - 8) / 2 --- End diff -- maybe worth adding the `Int` type even though it's already an Int. Also the comment at the line 28 should be changed to `1073741819` i.e. `- 8/2` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/19068 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19271: [SPARK-22053][SS] Stream-stream inner join in Append Mod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19271 **[Test build #81913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81913/testReport)** for PR 19271 at commit [`9004d46`](https://github.com/apache/spark/commit/9004d46a70ddb785f687d4a2726fe73719553ae3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18754: [WIP][SPARK-21552][SQL] Add DecimalType support t...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18754#discussion_r139612110 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala --- @@ -224,6 +226,25 @@ private[arrow] class DoubleWriter(val valueVector: NullableFloat8Vector) extends } } +private[arrow] class DecimalWriter( +val valueVector: NullableDecimalVector, +precision: Int, +scale: Int) extends ArrowFieldWriter { + + override def valueMutator: NullableDecimalVector#Mutator = valueVector.getMutator() + + override def setNull(): Unit = { +valueMutator.setNull(count) + } + + override def setValue(input: SpecializedGetters, ordinal: Int): Unit = { +valueMutator.setIndexDefined(count) +val decimal = input.getDecimal(ordinal, precision, scale) +decimal.changePrecision(precision, scale) +DecimalUtility.writeBigDecimalToArrowBuf(decimal.toJavaBigDecimal, valueVector.getBuffer, count) --- End diff -- @BryanCutler Thanks, I'll update it to use `setSafe` after upgrading Arrow to 0.7. Btw, when I tested upgrading to 0.7 locally, `ArrowConvertersSuite.string type conversion` came to fail. Do you have any ideas of that? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19273: Revert "[SPARK-21428] Turn IsolatedClientLoader off whil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19273 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81908/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19273: Revert "[SPARK-21428] Turn IsolatedClientLoader off whil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19273 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19130 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81912/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19068 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81910/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19068 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19130 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19273: Revert "[SPARK-21428] Turn IsolatedClientLoader off whil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19273 **[Test build #81908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81908/testReport)** for PR 19273 at commit [`72c5a8e`](https://github.com/apache/spark/commit/72c5a8e1474915422f154ac29c9dbe472081b3d6). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19068 **[Test build #81910 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81910/testReport)** for PR 19068 at commit [`c5c1c26`](https://github.com/apache/spark/commit/c5c1c2625d33dd08cbdde2e25041359bbbf50339). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19130 **[Test build #81912 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81912/testReport)** for PR 19130 at commit [`9a2c8c7`](https://github.com/apache/spark/commit/9a2c8c79451ab654e2a8fae8b67e4a31d9c8b97f). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19135: [SPARK-21923][CORE]Avoid calling reserveUnrollMem...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19135 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19130#discussion_r139609285 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -385,4 +385,14 @@ package object config { .checkValue(v => v > 0 && v <= Int.MaxValue, s"The buffer size must be greater than 0 and less than ${Int.MaxValue}.") .createWithDefault(1024 * 1024) + + private[spark] val FORCE_DOWNLOAD_SCHEMES = +ConfigBuilder("spark.yarn.dist.forceDownloadSchemes") + .doc("Comma-separated list of schemes for which files will be downloaded to the " + +"local disk prior to being added to YARN's distributed cache. For use in cases " + +"where the YARN service does not support schemes that are supported by Spark, like http, " + +"https, ftp.") --- End diff -- ah got it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19135: [SPARK-21923][CORE]Avoid calling reserveUnrollMemoryForT...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19135 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/19130#discussion_r139608374 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -385,4 +385,14 @@ package object config { .checkValue(v => v > 0 && v <= Int.MaxValue, s"The buffer size must be greater than 0 and less than ${Int.MaxValue}.") .createWithDefault(1024 * 1024) + + private[spark] val FORCE_DOWNLOAD_SCHEMES = +ConfigBuilder("spark.yarn.dist.forceDownloadSchemes") + .doc("Comma-separated list of schemes for which files will be downloaded to the " + +"local disk prior to being added to YARN's distributed cache. For use in cases " + +"where the YARN service does not support schemes that are supported by Spark, like http, " + +"https, ftp.") --- End diff -- It is not required, we still want to leverage Hadoop's http(s) FS to distribute resources by default if it is supported in Hadoop 2.9+ (https://issues.apache.org/jira/browse/HADOOP-14383) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19145: [spark-21933][yarn] Spark Streaming request more executo...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19145 And based on your fix: 1. looks like you don't have retention mechanism, which will potential introduce memory leak. 2. I don't see your logic to avoid requesting new containers, is your current logic enough to fix the issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19130#discussion_r139607663 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -385,4 +385,14 @@ package object config { .checkValue(v => v > 0 && v <= Int.MaxValue, s"The buffer size must be greater than 0 and less than ${Int.MaxValue}.") .createWithDefault(1024 * 1024) + + private[spark] val FORCE_DOWNLOAD_SCHEMES = +ConfigBuilder("spark.yarn.dist.forceDownloadSchemes") + .doc("Comma-separated list of schemes for which files will be downloaded to the " + +"local disk prior to being added to YARN's distributed cache. For use in cases " + +"where the YARN service does not support schemes that are supported by Spark, like http, " + +"https, ftp.") --- End diff -- shall we make these 3 the default value of this config? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19130#discussion_r139607620 --- Diff: docs/running-on-yarn.md --- @@ -212,6 +212,15 @@ To use a custom metrics.properties for the application master and executors, upd + spark.yarn.dist.forceDownloadSchemes + (none) + +Comma-separated list of schemes for which files will be downloaded to the local disk prior to +being added to YARN's distributed cache. For use in cases where the YARN service does not +support schemes that are supported by Spark. --- End diff -- update here too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19145: [spark-21933][yarn] Spark Streaming request more executo...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19145 >But if we restart the RM, then, the lost containers in the NM will be reported to RM as lost again because of recovery Since you already enabled RM and NM recovery, IIUC the failure of RM/NM will not lead to container exit. And after RM/NM restart, it will recover the persistent container metadata, so I think there should be no lost containers reported. Sorry I'm not so familiar with this part in YARN. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19229 Oh. That's what have done in the old PR #18902 .(Because the RDD version (not in master branch, only personal impl here (sorry for put wrong link, the code link is here: https://github.com/apache/spark/pull/18902/commits/8daffc9007c65f04e005ffe5dcfbeca634480465) will be faster than dataframe version based on current spark. Now your PR has some improvement on the perf, I would like to compare them again. We hope to track this performance gap and try to resolve it in the future. According to my other similar case, now the dataframe version will be about 2-3x slower than RDD version in the case numCols==100 for now. But if you have no time, I can help do it. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19211: [SPARK-18838][core] Add separate listener queues ...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/19211#discussion_r139606603 --- Diff: core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala --- @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import java.util.concurrent.LinkedBlockingQueue +import java.util.concurrent.atomic.{AtomicBoolean, AtomicLong} + +import com.codahale.metrics.{Gauge, Timer} + +import org.apache.spark.{SparkConf, SparkContext} +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ +import org.apache.spark.util.Utils + +/** + * An asynchronous queue for events. All events posted to this queue will be delivered to the child + * listeners in a separate thread. + * + * Delivery will only begin when the `start()` method is called. The `stop()` method should be + * called when no more events need to be delivered. + */ +private class AsyncEventQueue(val name: String, conf: SparkConf, metrics: LiveListenerBusMetrics) + extends SparkListenerBus + with Logging { + + import AsyncEventQueue._ + + // Cap the capacity of the queue so we get an explicit error (rather than an OOM exception) if + // it's perpetually being added to more quickly than it's being drained. + private val eventQueue = new LinkedBlockingQueue[SparkListenerEvent]( +conf.get(LISTENER_BUS_EVENT_QUEUE_CAPACITY)) + + // Keep the event count separately, so that waitUntilEmpty() can be implemented properly; + // this allows that method to return only when the events in the queue have been fully + // processed (instead of just dequeued). + private val eventCount = new AtomicLong() + + /** A counter for dropped events. It will be reset every time we log it. */ + private val droppedEventsCounter = new AtomicLong(0L) + + /** When `droppedEventsCounter` was logged last time in milliseconds. */ + @volatile private var lastReportTimestamp = 0L + + private val logDroppedEvent = new AtomicBoolean(false) + + private var sc: SparkContext = null + + private val started = new AtomicBoolean(false) + private val stopped = new AtomicBoolean(false) + + private val droppedEvents = metrics.metricRegistry.counter(s"queue.$name.numDroppedEvents") + private val processingTime = metrics.metricRegistry.timer(s"queue.$name.listenerProcessingTime") + + // Remove the queue size gauge first, in case it was created by a previous incarnation of + // this queue that was removed from the listener bus. + metrics.metricRegistry.remove(s"queue.$name.size") + metrics.metricRegistry.register(s"queue.$name.size", new Gauge[Int] { +override def getValue: Int = eventQueue.size() + }) + + private val dispatchThread = new Thread(s"spark-listener-group-$name") { +setDaemon(true) +override def run(): Unit = Utils.tryOrStopSparkContext(sc) { + dispatch() +} + } + + private def dispatch(): Unit = LiveListenerBus.withinListenerThread.withValue(true) { +try { + var next: SparkListenerEvent = eventQueue.take() + while (next != POISON_PILL) { +val ctx = processingTime.time() +try { + super.postToAll(next) +} finally { + ctx.stop() +} +eventCount.decrementAndGet() +next = eventQueue.take() + } + eventCount.decrementAndGet() +} catch { + case ie: InterruptedException => +logInfo(s"Stopping listener queue $name.", ie) +} + } + + override protected def getTimer(listener: SparkListenerInterface): Option[Timer] = { + metrics.getTimerForListenerClass(listener.getClass.asSubclass(classOf[SparkListenerInterface])) + } + + /** + * Start an asynchronous thread to dispatch events to the underlying listeners.
[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18704 LGTM, I think eventually we should simplify the columnar cache module and codegen most of it to reduce code duplication. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18704#discussion_r139605958 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1311,4 +1314,172 @@ class ColumnarBatchSuite extends SparkFunSuite { batch.close() allocator.close() } + + test("CachedBatch boolean Apis") { --- End diff -- move these to a new test suite --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17819 @WeichenXu123 Yeah, I'm merging it. I just want to clarify adding trait to a class doesn't necessarily makes java incompatible. :) Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17819 Yes you can only move `setInputCols` into the outer class to resolve this issue. But I prefer merge it together. I think we can unify the `transform` method. (First we check param `inputCol` and `inputCols`, and construct a col list and pass it to the transform code. it will simplify the code). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 @WeichenXu123 I'm not sure I understand it correctly. This change only replaces the chain of `withColumn` to a pass of `withColumns`. We don't have RDD version for this, so I'm not sure what version you want to compare? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17819 Btw, the reason that this change isn't java compatible, is not mainly because adding a trait to `Bucketizer`. Looks like It is because the params setter methods such as `setInputCols`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17819 @WeichenXu123 I see. That's correct this change is not java compatible. Thanks for pointing out. I'm merging the changes into `Bucketizer`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org