[GitHub] spark issue #19273: Revert "[SPARK-21428] Turn IsolatedClientLoader off whil...

2017-09-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19273
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...

2017-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19130
  
**[Test build #81914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81914/testReport)**
 for PR 19130 at commit 
[`0fb7943`](https://github.com/apache/spark/commit/0fb79432f5483f3fb83360d634dd0a5aacade157).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...

2017-09-19 Thread buryat
Github user buryat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19266#discussion_r139616346
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -344,7 +344,7 @@ class Word2Vec extends Serializable with Logging {
 val newSentences = sentences.repartition(numPartitions).cache()
 val initRandom = new XORShiftRandom(seed)
 
-if (vocabSize.toLong * vectorSize >= Int.MaxValue) {
+if (vocabSize.toLong * vectorSize >= Int.MaxValue - 8) {
--- End diff --

I think this should be just `>`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...

2017-09-19 Thread buryat
Github user buryat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19266#discussion_r139616411
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -304,8 +304,8 @@ class BlockMatrix @Since("1.3.0") (
   s"Int.MaxValue. Currently numRows: ${numRows()}")
 require(numCols() < Int.MaxValue, "The number of columns of this 
matrix should be less than " +
   s"Int.MaxValue. Currently numCols: ${numCols()}")
-require(numRows() * numCols() < Int.MaxValue, "The length of the 
values array must be " +
-  s"less than Int.MaxValue. Currently numRows * numCols: ${numRows() * 
numCols()}")
+require(numRows() * numCols() < Int.MaxValue - 8, "The length of the 
values array must be " +
--- End diff --

`<=`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...

2017-09-19 Thread buryat
Github user buryat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19266#discussion_r139616165
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/CompactBuffer.scala ---
@@ -126,22 +126,20 @@ private[spark] class CompactBuffer[T: ClassTag] 
extends Seq[T] with Serializable
 
   /** Increase our size to newSize and grow the backing array if needed. */
   private def growToSize(newSize: Int): Unit = {
-if (newSize < 0) {
-  throw new UnsupportedOperationException("Can't grow buffer past 
Int.MaxValue elements")
+val arrayMax = Int.MaxValue - 8
+if (newSize < 0 || newSize - 2 > arrayMax) {
+  throw new UnsupportedOperationException(s"Can't grow buffer past 
$arrayMax elements")
 }
 val capacity = if (otherElements != null) otherElements.length + 2 
else 2
 if (newSize > capacity) {
-  var newArrayLen = 8
+  var newArrayLen = 8L
   while (newSize - 2 > newArrayLen) {
--- End diff --

ah, I see that it's reserved, wasn't clear to me why `- 2`, seems like a 
magic number to me, I see it other places too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...

2017-09-19 Thread buryat
Github user buryat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19266#discussion_r139615624
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/array/LongArray.java ---
@@ -39,7 +39,7 @@
   private final long length;
 
   public LongArray(MemoryBlock memory) {
-assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size > 4 
billion elements";
+assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size > 2.1 
billion elements";
--- End diff --

maybe also add the exact number in the error message instead of `2.1 
billion`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...

2017-09-19 Thread buryat
Github user buryat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19266#discussion_r139615418
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/PartitionedPairBuffer.scala
 ---
@@ -96,5 +96,5 @@ private[spark] class PartitionedPairBuffer[K, 
V](initialCapacity: Int = 64)
 }
 
 private object PartitionedPairBuffer {
-  val MAXIMUM_CAPACITY = Int.MaxValue / 2 // 2 ^ 30 - 1
+  val MAXIMUM_CAPACITY = (Int.MaxValue - 8) / 2
--- End diff --

maybe worth adding the `Int` type even though it's already an Int.

Also the comment at the line 28 should be changed to `1073741819` i.e. `- 
8/2`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-19 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/19068
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19271: [SPARK-22053][SS] Stream-stream inner join in Append Mod...

2017-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19271
  
**[Test build #81913 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81913/testReport)**
 for PR 19271 at commit 
[`9004d46`](https://github.com/apache/spark/commit/9004d46a70ddb785f687d4a2726fe73719553ae3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18754: [WIP][SPARK-21552][SQL] Add DecimalType support t...

2017-09-19 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18754#discussion_r139612110
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala 
---
@@ -224,6 +226,25 @@ private[arrow] class DoubleWriter(val valueVector: 
NullableFloat8Vector) extends
   }
 }
 
+private[arrow] class DecimalWriter(
+val valueVector: NullableDecimalVector,
+precision: Int,
+scale: Int) extends ArrowFieldWriter {
+
+  override def valueMutator: NullableDecimalVector#Mutator = 
valueVector.getMutator()
+
+  override def setNull(): Unit = {
+valueMutator.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+valueMutator.setIndexDefined(count)
+val decimal = input.getDecimal(ordinal, precision, scale)
+decimal.changePrecision(precision, scale)
+DecimalUtility.writeBigDecimalToArrowBuf(decimal.toJavaBigDecimal, 
valueVector.getBuffer, count)
--- End diff --

@BryanCutler Thanks, I'll update it to use `setSafe` after upgrading Arrow 
to 0.7.

Btw, when I tested upgrading to 0.7 locally, `ArrowConvertersSuite.string 
type conversion` came to fail. Do you have any ideas of that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19273: Revert "[SPARK-21428] Turn IsolatedClientLoader off whil...

2017-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19273
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81908/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19273: Revert "[SPARK-21428] Turn IsolatedClientLoader off whil...

2017-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19273
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...

2017-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19130
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81912/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19068
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81910/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19068
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...

2017-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19130
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19273: Revert "[SPARK-21428] Turn IsolatedClientLoader off whil...

2017-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19273
  
**[Test build #81908 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81908/testReport)**
 for PR 19273 at commit 
[`72c5a8e`](https://github.com/apache/spark/commit/72c5a8e1474915422f154ac29c9dbe472081b3d6).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19068
  
**[Test build #81910 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81910/testReport)**
 for PR 19068 at commit 
[`c5c1c26`](https://github.com/apache/spark/commit/c5c1c2625d33dd08cbdde2e25041359bbbf50339).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...

2017-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19130
  
**[Test build #81912 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81912/testReport)**
 for PR 19130 at commit 
[`9a2c8c7`](https://github.com/apache/spark/commit/9a2c8c79451ab654e2a8fae8b67e4a31d9c8b97f).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19135: [SPARK-21923][CORE]Avoid calling reserveUnrollMem...

2017-09-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19135


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...

2017-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19130#discussion_r139609285
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -385,4 +385,14 @@ package object config {
   .checkValue(v => v > 0 && v <= Int.MaxValue,
 s"The buffer size must be greater than 0 and less than 
${Int.MaxValue}.")
   .createWithDefault(1024 * 1024)
+
+  private[spark] val FORCE_DOWNLOAD_SCHEMES =
+ConfigBuilder("spark.yarn.dist.forceDownloadSchemes")
+  .doc("Comma-separated list of schemes for which files will be 
downloaded to the " +
+"local disk prior to being added to YARN's distributed cache. For 
use in cases " +
+"where the YARN service does not support schemes that are 
supported by Spark, like http, " +
+"https, ftp.")
--- End diff --

ah got it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19135: [SPARK-21923][CORE]Avoid calling reserveUnrollMemoryForT...

2017-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19135
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...

2017-09-19 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19130#discussion_r139608374
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -385,4 +385,14 @@ package object config {
   .checkValue(v => v > 0 && v <= Int.MaxValue,
 s"The buffer size must be greater than 0 and less than 
${Int.MaxValue}.")
   .createWithDefault(1024 * 1024)
+
+  private[spark] val FORCE_DOWNLOAD_SCHEMES =
+ConfigBuilder("spark.yarn.dist.forceDownloadSchemes")
+  .doc("Comma-separated list of schemes for which files will be 
downloaded to the " +
+"local disk prior to being added to YARN's distributed cache. For 
use in cases " +
+"where the YARN service does not support schemes that are 
supported by Spark, like http, " +
+"https, ftp.")
--- End diff --

It is not required, we still want to leverage Hadoop's http(s) FS to 
distribute resources by default if it is supported in Hadoop 2.9+ 
(https://issues.apache.org/jira/browse/HADOOP-14383)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19145: [spark-21933][yarn] Spark Streaming request more executo...

2017-09-19 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19145
  
And based on your fix:

1. looks like you don't have retention mechanism, which will potential 
introduce memory leak.
2. I don't see your logic to avoid requesting new containers, is your 
current logic enough to fix the issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...

2017-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19130#discussion_r139607663
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -385,4 +385,14 @@ package object config {
   .checkValue(v => v > 0 && v <= Int.MaxValue,
 s"The buffer size must be greater than 0 and less than 
${Int.MaxValue}.")
   .createWithDefault(1024 * 1024)
+
+  private[spark] val FORCE_DOWNLOAD_SCHEMES =
+ConfigBuilder("spark.yarn.dist.forceDownloadSchemes")
+  .doc("Comma-separated list of schemes for which files will be 
downloaded to the " +
+"local disk prior to being added to YARN's distributed cache. For 
use in cases " +
+"where the YARN service does not support schemes that are 
supported by Spark, like http, " +
+"https, ftp.")
--- End diff --

shall we make these 3 the default value of this config?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...

2017-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19130#discussion_r139607620
  
--- Diff: docs/running-on-yarn.md ---
@@ -212,6 +212,15 @@ To use a custom metrics.properties for the application 
master and executors, upd
   
 
 
+  spark.yarn.dist.forceDownloadSchemes
+  (none)
+  
+Comma-separated list of schemes for which files will be downloaded to 
the local disk prior to 
+being added to YARN's distributed cache. For use in cases where the 
YARN service does not 
+support schemes that are supported by Spark.
--- End diff --

update here too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19145: [spark-21933][yarn] Spark Streaming request more executo...

2017-09-19 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19145
  
>But if we restart the RM, then, the lost containers in the NM will be 
reported to RM as lost again because of recovery

Since you already enabled RM and NM recovery, IIUC the failure of RM/NM 
will not lead to container exit. And after RM/NM restart, it will recover the 
persistent container metadata, so I think there should be no lost containers 
reported. Sorry I'm not so familiar with this part in YARN.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19229
  
Oh. That's what have done in the old PR #18902 .(Because the RDD version 
(not in master branch, only personal impl here (sorry for put wrong link, the 
code link is here: 
https://github.com/apache/spark/pull/18902/commits/8daffc9007c65f04e005ffe5dcfbeca634480465)
 will be faster than dataframe version based on current spark. Now your PR has 
some improvement on the perf, I would like to compare them again. We hope to 
track this performance gap and try to resolve it in the future. According to my 
other similar case, now the dataframe version will be about 2-3x slower than 
RDD version in the case numCols==100 for now. But if you have no time, I can 
help do it. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19211: [SPARK-18838][core] Add separate listener queues ...

2017-09-19 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19211#discussion_r139606603
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala ---
@@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import java.util.concurrent.LinkedBlockingQueue
+import java.util.concurrent.atomic.{AtomicBoolean, AtomicLong}
+
+import com.codahale.metrics.{Gauge, Timer}
+
+import org.apache.spark.{SparkConf, SparkContext}
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.util.Utils
+
+/**
+ * An asynchronous queue for events. All events posted to this queue will 
be delivered to the child
+ * listeners in a separate thread.
+ *
+ * Delivery will only begin when the `start()` method is called. The 
`stop()` method should be
+ * called when no more events need to be delivered.
+ */
+private class AsyncEventQueue(val name: String, conf: SparkConf, metrics: 
LiveListenerBusMetrics)
+  extends SparkListenerBus
+  with Logging {
+
+  import AsyncEventQueue._
+
+  // Cap the capacity of the queue so we get an explicit error (rather 
than an OOM exception) if
+  // it's perpetually being added to more quickly than it's being drained.
+  private val eventQueue = new LinkedBlockingQueue[SparkListenerEvent](
+conf.get(LISTENER_BUS_EVENT_QUEUE_CAPACITY))
+
+  // Keep the event count separately, so that waitUntilEmpty() can be 
implemented properly;
+  // this allows that method to return only when the events in the queue 
have been fully
+  // processed (instead of just dequeued).
+  private val eventCount = new AtomicLong()
+
+  /** A counter for dropped events. It will be reset every time we log it. 
*/
+  private val droppedEventsCounter = new AtomicLong(0L)
+
+  /** When `droppedEventsCounter` was logged last time in milliseconds. */
+  @volatile private var lastReportTimestamp = 0L
+
+  private val logDroppedEvent = new AtomicBoolean(false)
+
+  private var sc: SparkContext = null
+
+  private val started = new AtomicBoolean(false)
+  private val stopped = new AtomicBoolean(false)
+
+  private val droppedEvents = 
metrics.metricRegistry.counter(s"queue.$name.numDroppedEvents")
+  private val processingTime = 
metrics.metricRegistry.timer(s"queue.$name.listenerProcessingTime")
+
+  // Remove the queue size gauge first, in case it was created by a 
previous incarnation of
+  // this queue that was removed from the listener bus.
+  metrics.metricRegistry.remove(s"queue.$name.size")
+  metrics.metricRegistry.register(s"queue.$name.size", new Gauge[Int] {
+override def getValue: Int = eventQueue.size()
+  })
+
+  private val dispatchThread = new Thread(s"spark-listener-group-$name") {
+setDaemon(true)
+override def run(): Unit = Utils.tryOrStopSparkContext(sc) {
+  dispatch()
+}
+  }
+
+  private def dispatch(): Unit = 
LiveListenerBus.withinListenerThread.withValue(true) {
+try {
+  var next: SparkListenerEvent = eventQueue.take()
+  while (next != POISON_PILL) {
+val ctx = processingTime.time()
+try {
+  super.postToAll(next)
+} finally {
+  ctx.stop()
+}
+eventCount.decrementAndGet()
+next = eventQueue.take()
+  }
+  eventCount.decrementAndGet()
+} catch {
+  case ie: InterruptedException =>
+logInfo(s"Stopping listener queue $name.", ie)
+}
+  }
+
+  override protected def getTimer(listener: SparkListenerInterface): 
Option[Timer] = {
+
metrics.getTimerForListenerClass(listener.getClass.asSubclass(classOf[SparkListenerInterface]))
+  }
+
+  /**
+   * Start an asynchronous thread to dispatch events to the underlying 
listeners.
   

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

2017-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18704
  
LGTM, I think eventually we should simplify the columnar cache module and 
codegen most of it to reduce code duplication.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

2017-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18704#discussion_r139605958
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
 ---
@@ -1311,4 +1314,172 @@ class ColumnarBatchSuite extends SparkFunSuite {
 batch.close()
 allocator.close()
   }
+
+  test("CachedBatch boolean Apis") {
--- End diff --

move these to a new test suite


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-09-19 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17819
  
@WeichenXu123 Yeah, I'm merging it. I just want to clarify adding trait to 
a class doesn't necessarily makes java incompatible. :) Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-09-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/17819
  
Yes you can only move `setInputCols` into the outer class to resolve this 
issue. But I prefer merge it together. I think we can unify the `transform` 
method. (First we check param `inputCol` and `inputCols`, and construct a col 
list and pass it to the transform code. it will simplify the code).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-19 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19229
  
@WeichenXu123 I'm not sure I understand it correctly. This change only 
replaces the chain of `withColumn` to a pass of `withColumns`. We don't have 
RDD version for this, so I'm not sure what version you want to compare?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-09-19 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17819
  
Btw, the reason that this change isn't java compatible, is not mainly 
because adding a trait to `Bucketizer`. Looks like It is because the params 
setter methods such as `setInputCols`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-09-19 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17819
  
@WeichenXu123 I see. That's correct this change is not java compatible. 
Thanks for pointing out. I'm merging the changes into `Bucketizer`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6