date:20180315

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20767
  
**[Test build #88285 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88285/testReport)**
 for PR 20767 at commit 
[`5363ea8`](https://github.com/apache/spark/commit/5363ea8c4a22340fda257658aba122839fe5a15a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20767
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20767
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88285/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

2018-03-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20687#discussion_r174974450
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
 ---
@@ -331,4 +330,47 @@ class ComplexTypesSuite extends PlanTest with 
ExpressionEvalHelper {
   .analyze
 comparePlans(Optimizer execute rel, expected)
   }
+
+  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
+val structRel = relation
+  .select(GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)), 
0, None) as "foo")
+  .groupBy($"foo")("1").analyze
+val structExpected = relation
+  .select('nullable_id as "foo")
+  .groupBy($"foo")("1").analyze
+comparePlans(Optimizer execute structRel, structExpected)
+
+// These tests must use nullable attributes from the base relation for 
the following reason:
+// in the 'original' plans below, the Aggregate node produced by 
groupBy() has a
+// nullable AttributeReference to a1, because both array indexing and 
map lookup are
+// nullable expressions. After optimization, the same attribute is now 
non-nullable,
+// but the AttributeReference is not updated to reflect this. In the 
'expected' plans,
+// the grouping expressions have the same nullability as the original 
attribute in the
+// relation. If that attribute is non-nullable, the tests will fail as 
the plans will
+// compare differently, so for these tests we must use a nullable 
attribute. See
+// SPARK-23634.
+val arrayRel = relation
+  .select(GetArrayItem(CreateArray(Seq('nullable_id, 'nullable_id + 
1L)), 0) as "a1")
+  .groupBy($"a1")("1").analyze
+val arrayExpected = relation.select('nullable_id as 
"a1").groupBy($"a1")("1").analyze
+comparePlans(Optimizer execute arrayRel, arrayExpected)
+
+val mapRel = relation
+  .select(GetMapValue(CreateMap(Seq("id", 'nullable_id)), "id") as 
"m1")
+  .groupBy($"m1")("1").analyze
+val mapExpected = relation
+  .select('nullable_id as "m1")
+  .groupBy($"m1")("1").analyze
+comparePlans(Optimizer execute mapRel, mapExpected)
+
--- End diff --

It seems that the current test case become too long. For the following 
negative cases, let's split to another test case. Maybe, with the following 
title?
```
test("SPARK-23500: Aggregation expressions should not be simplified.")
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

2018-03-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20687#discussion_r174974033
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -19,57 +19,47 @@ package org.apache.spark.sql.catalyst.optimizer
 
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.plans.logical.Aggregate
--- End diff --

This should be before line 21 in alphabetical order.
You can check this locally with `dev/scalastyle`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20767
  
**[Test build #88285 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88285/testReport)**
 for PR 20767 at commit 
[`5363ea8`](https://github.com/apache/spark/commit/5363ea8c4a22340fda257658aba122839fe5a15a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20767
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-15 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/20767
  
@tedyu @zsxwing thank you very much for catching the bugs. I have 
simplified the logic quite a bit. Note that I removed the invariant that I had 
introduced earlier. Additionally, I locally ran the stress test with 100 
threads and 1 read attempts, which ran for 2 mins. It passed. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20767
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1545/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-15 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20767#discussion_r174973494
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ---
@@ -342,80 +415,103 @@ private[kafka010] object CachedKafkaConsumer extends 
Logging {
 }
   }
 
-  def releaseKafkaConsumer(
-  topic: String,
-  partition: Int,
-  kafkaParams: ju.Map[String, Object]): Unit = {
-val groupId = 
kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
-val topicPartition = new TopicPartition(topic, partition)
-val key = CacheKey(groupId, topicPartition)
-
+  private def releaseConsumer(intConsumer: InternalKafkaConsumer): Unit = {
 synchronized {
-  val consumer = cache.get(key)
-  if (consumer != null) {
-consumer.inuse = false
-  } else {
-logWarning(s"Attempting to release consumer that does not exist")
-  }
-}
-  }
 
-  /**
-   * Removes (and closes) the Kafka Consumer for the given topic, 
partition and group id.
-   */
-  def removeKafkaConsumer(
-  topic: String,
-  partition: Int,
-  kafkaParams: ju.Map[String, Object]): Unit = {
-val groupId = 
kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
-val topicPartition = new TopicPartition(topic, partition)
-val key = CacheKey(groupId, topicPartition)
-
-synchronized {
-  val removedConsumer = cache.remove(key)
-  if (removedConsumer != null) {
-removedConsumer.close()
+  // If it has been marked for close, then do it any way
+  if (intConsumer.inuse && intConsumer.markedForClose) 
intConsumer.close()
--- End diff --

I rewrote the logic. Hopefully, it's simpler to reason about it now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20799: [SPARK-23635][YARN] AM env variable should not overwrite...

2018-03-15 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/20799
  
looks good to me


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20827
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20827
  
**[Test build #88284 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88284/testReport)**
 for PR 20827 at commit 
[`68650ff`](https://github.com/apache/spark/commit/68650ff8c2f3a90c55b5bf4345c16a92fda3782a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20827
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1544/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGr...

2018-03-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20813


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-15 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20827
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88275/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20659
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20831: [SPARK-23614][SQL] Fix incorrect reuse exchange w...

2018-03-15 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20831#discussion_r174969959
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
 ---
@@ -68,6 +69,15 @@ case class InMemoryRelation(
 
   override protected def innerChildren: Seq[SparkPlan] = Seq(child)
 
+  override def doCanonicalize(): logical.LogicalPlan =
+copy(output = output.map(QueryPlan.normalizeExprId(_, child.output)),
+  storageLevel = StorageLevel.NONE,
+  child = child.canonicalized,
+  tableName = None)(
+  _cachedColumnBuffers,
+  sizeInBytesStats,
+  statsOfPlanToCache)
--- End diff --

`statsOfPlanToCache` and `sizeInBytesStats`, too? For instance, 
`ResolveHint`  drops hints in canonicalization: 
https://github.com/apache/spark/blob/3675af7247e841e9a689666dc20891ba55c612b3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/hints.scala#L44


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20659
  
**[Test build #88275 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88275/testReport)**
 for PR 20659 at commit 
[`e041704`](https://github.com/apache/spark/commit/e041704c8898234299b8af1d85b09c1acc2532ab).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...

2018-03-15 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20813
  
I forgot to publish my comment above when I looked at this 2 days ago, so 
I'll fix it during merge.

Merging to master / 2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20744: [SPARK-23608][CORE][WebUI] Add synchronization in...

2018-03-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20744


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-15 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20839
  
@HyukjinKwon, @ueshin please take a look when you can


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20814: [SPARK-23671][core] Fix condition to enable the S...

2018-03-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20814


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20744: [SPARK-23608][CORE][WebUI] Add synchronization in SHS be...

2018-03-15 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20744
  
This doesn't have anything to do with pyspark, so we can ignore those.

Merging to master / 2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20814: [SPARK-23671][core] Fix condition to enable the SHS thre...

2018-03-15 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20814
  
Merging to master / 2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20785: [SPARK-23640][CORE] Fix hadoop config may overrid...

2018-03-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20785#discussion_r174968586
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2434,7 +2434,8 @@ private[spark] object Utils extends Logging {
*/
   def getSparkOrYarnConfig(conf: SparkConf, key: String, default: String): 
String = {
 val sparkValue = conf.get(key, default)
-if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn") {
+if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
--- End diff --

No.

The logic you want here is the equivalent of:

```
if conf.contains(key)
  get spark conf
elif is_running_on_yarn()
  get conf from yarn
else
  return default
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-15 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20767#discussion_r174968594
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ---
@@ -342,80 +415,103 @@ private[kafka010] object CachedKafkaConsumer extends 
Logging {
 }
   }
 
-  def releaseKafkaConsumer(
-  topic: String,
-  partition: Int,
-  kafkaParams: ju.Map[String, Object]): Unit = {
-val groupId = 
kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
-val topicPartition = new TopicPartition(topic, partition)
-val key = CacheKey(groupId, topicPartition)
-
+  private def releaseConsumer(intConsumer: InternalKafkaConsumer): Unit = {
 synchronized {
-  val consumer = cache.get(key)
-  if (consumer != null) {
-consumer.inuse = false
-  } else {
-logWarning(s"Attempting to release consumer that does not exist")
-  }
-}
-  }
 
-  /**
-   * Removes (and closes) the Kafka Consumer for the given topic, 
partition and group id.
-   */
-  def removeKafkaConsumer(
-  topic: String,
-  partition: Int,
-  kafkaParams: ju.Map[String, Object]): Unit = {
-val groupId = 
kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
-val topicPartition = new TopicPartition(topic, partition)
-val key = CacheKey(groupId, topicPartition)
-
-synchronized {
-  val removedConsumer = cache.remove(key)
-  if (removedConsumer != null) {
-removedConsumer.close()
+  // If it has been marked for close, then do it any way
+  if (intConsumer.inuse && intConsumer.markedForClose) 
intConsumer.close()
+  intConsumer.inuse = false
+
+  // Clear the consumer from the cache if this is indeed the consumer 
present in the cache
+  val key = new CacheKey(intConsumer.topicPartition, 
intConsumer.kafkaParams)
+  val cachedIntConsumer = cache.get(key)
+  if (cachedIntConsumer != null) {
+if (cachedIntConsumer.eq(intConsumer)) {
+  // The released consumer is indeed the cached one.
+  cache.remove(key)
+} else {
+  // The released consumer is not the cached one. Don't do 
anything.
+  // This should not happen as long as we maintain the invariant 
mentioned above.
+  logWarning(
+s"Cached consumer not the same one as the one being release" +
+  s"\ncached = $cachedIntConsumer 
[${System.identityHashCode(cachedIntConsumer)}]" +
+  s"\nreleased = $intConsumer 
[${System.identityHashCode(intConsumer)}]")
+}
+  } else {
+// The released consumer is not in the cache. Don't do anything.
+// This should not happen as long as we maintain the invariant 
mentioned above.
+logWarning(s"Attempting to release consumer that is not in the 
cache")
   }
 }
   }
 
   /**
* Get a cached consumer for groupId, assigned to topic and partition.
* If matching consumer doesn't already exist, will be created using 
kafkaParams.
+   * The returned consumer must be released explicitly using 
[[KafkaDataConsumer.release()]].
+   *
+   * Note: This method guarantees that the consumer returned is not 
currently in use by any one
+   * else. Within this guarantee, this will make a best effort attempt to 
re-use consumers by
+   * caching them and tracking when they are in use.
*/
-  def getOrCreate(
-  topic: String,
-  partition: Int,
-  kafkaParams: ju.Map[String, Object]): CachedKafkaConsumer = 
synchronized {
-val groupId = 
kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
-val topicPartition = new TopicPartition(topic, partition)
-val key = CacheKey(groupId, topicPartition)
-
-// If this is reattempt at running the task, then invalidate cache and 
start with
-// a new consumer
+  def acquire(
+  topicPartition: TopicPartition,
+  kafkaParams: ju.Map[String, Object],
+  useCache: Boolean): KafkaDataConsumer = synchronized {
+val key = new CacheKey(topicPartition, kafkaParams)
+val existingInternalConsumer = cache.get(key)
+
+lazy val newInternalConsumer = new 
InternalKafkaConsumer(topicPartition, kafkaParams)
+
 if (TaskContext.get != null && TaskContext.get.attemptNumber >= 1) {
-  removeKafkaConsumer(topic, partition, kafkaParams)
-  val consumer = new CachedKafkaConsumer(topicPartition, kafkaParams)
-  consumer.inuse = true
-  cache.put(key, consumer)
-  consumer
-} else {
-  if

[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20327
  
**[Test build #88283 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88283/testReport)**
 for PR 20327 at commit 
[`ae4ad4a`](https://github.com/apache/spark/commit/ae4ad4a7568cf5845861237d848468c4dc8cf840).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20815: [SPARK-23658][LAUNCHER] InProcessAppHandle uses t...

2018-03-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20815


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-15 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20767#discussion_r174968294
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ---
@@ -342,80 +415,103 @@ private[kafka010] object CachedKafkaConsumer extends 
Logging {
 }
   }
 
-  def releaseKafkaConsumer(
-  topic: String,
-  partition: Int,
-  kafkaParams: ju.Map[String, Object]): Unit = {
-val groupId = 
kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
-val topicPartition = new TopicPartition(topic, partition)
-val key = CacheKey(groupId, topicPartition)
-
+  private def releaseConsumer(intConsumer: InternalKafkaConsumer): Unit = {
 synchronized {
-  val consumer = cache.get(key)
-  if (consumer != null) {
-consumer.inuse = false
-  } else {
-logWarning(s"Attempting to release consumer that does not exist")
-  }
-}
-  }
 
-  /**
-   * Removes (and closes) the Kafka Consumer for the given topic, 
partition and group id.
-   */
-  def removeKafkaConsumer(
-  topic: String,
-  partition: Int,
-  kafkaParams: ju.Map[String, Object]): Unit = {
-val groupId = 
kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
-val topicPartition = new TopicPartition(topic, partition)
-val key = CacheKey(groupId, topicPartition)
-
-synchronized {
-  val removedConsumer = cache.remove(key)
-  if (removedConsumer != null) {
-removedConsumer.close()
+  // If it has been marked for close, then do it any way
+  if (intConsumer.inuse && intConsumer.markedForClose) 
intConsumer.close()
+  intConsumer.inuse = false
+
+  // Clear the consumer from the cache if this is indeed the consumer 
present in the cache
+  val key = new CacheKey(intConsumer.topicPartition, 
intConsumer.kafkaParams)
+  val cachedIntConsumer = cache.get(key)
+  if (cachedIntConsumer != null) {
+if (cachedIntConsumer.eq(intConsumer)) {
+  // The released consumer is indeed the cached one.
+  cache.remove(key)
+} else {
+  // The released consumer is not the cached one. Don't do 
anything.
+  // This should not happen as long as we maintain the invariant 
mentioned above.
+  logWarning(
+s"Cached consumer not the same one as the one being release" +
+  s"\ncached = $cachedIntConsumer 
[${System.identityHashCode(cachedIntConsumer)}]" +
+  s"\nreleased = $intConsumer 
[${System.identityHashCode(intConsumer)}]")
+}
+  } else {
+// The released consumer is not in the cache. Don't do anything.
+// This should not happen as long as we maintain the invariant 
mentioned above.
+logWarning(s"Attempting to release consumer that is not in the 
cache")
   }
 }
   }
 
   /**
* Get a cached consumer for groupId, assigned to topic and partition.
* If matching consumer doesn't already exist, will be created using 
kafkaParams.
+   * The returned consumer must be released explicitly using 
[[KafkaDataConsumer.release()]].
+   *
+   * Note: This method guarantees that the consumer returned is not 
currently in use by any one
+   * else. Within this guarantee, this will make a best effort attempt to 
re-use consumers by
+   * caching them and tracking when they are in use.
*/
-  def getOrCreate(
-  topic: String,
-  partition: Int,
-  kafkaParams: ju.Map[String, Object]): CachedKafkaConsumer = 
synchronized {
-val groupId = 
kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
-val topicPartition = new TopicPartition(topic, partition)
-val key = CacheKey(groupId, topicPartition)
-
-// If this is reattempt at running the task, then invalidate cache and 
start with
-// a new consumer
+  def acquire(
+  topicPartition: TopicPartition,
+  kafkaParams: ju.Map[String, Object],
+  useCache: Boolean): KafkaDataConsumer = synchronized {
+val key = new CacheKey(topicPartition, kafkaParams)
+val existingInternalConsumer = cache.get(key)
+
+lazy val newInternalConsumer = new 
InternalKafkaConsumer(topicPartition, kafkaParams)
+
 if (TaskContext.get != null && TaskContext.get.attemptNumber >= 1) {
-  removeKafkaConsumer(topic, partition, kafkaParams)
-  val consumer = new CachedKafkaConsumer(topicPartition, kafkaParams)
-  consumer.inuse = true
-  cache.put(key, consumer)
-  consumer
-} else {
-  if

[GitHub] spark issue #20840: [SPARK-23702][SS] Forbid watermarks on both sides of a s...

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20840
  
**[Test build #88282 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88282/testReport)**
 for PR 20840 at commit 
[`9d742ab`](https://github.com/apache/spark/commit/9d742ab4d22021654aa2778d601d806b7a7d106a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-15 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20327
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20840: [SPARK-23702][SS] Forbid watermarks on both sides of a s...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20840
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20840: [SPARK-23702][SS] Forbid watermarks on both sides...

2018-03-15 Thread jose-torres

GitHub user jose-torres opened a pull request:

https://github.com/apache/spark/pull/20840

[SPARK-23702][SS] Forbid watermarks on both sides of a streaming aggregate.

## What changes were proposed in this pull request?

Forbid watermarks on both sides of a streaming aggregate. As detailed in 
the jira, we don't currently support this and doing so would require 
significant revision to the execution model.

## How was this patch tested?

new unit test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jose-torres/spark disallowAgg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20840.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20840


commit 9d742ab4d22021654aa2778d601d806b7a7d106a
Author: Jose Torres 
Date:   2018-03-16T00:03:54Z

disallow watermark on both sides




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20815: [SPARK-23658][LAUNCHER] InProcessAppHandle uses the wron...

2018-03-15 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20815
  
Merging to master / 2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGr...

2018-03-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20813#discussion_r174202008
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusStore.scala
 ---
@@ -53,6 +53,9 @@ class SQLAppStatusStore(
   def executionsCount(): Long = {
 store.count(classOf[SQLExecutionUIData])
   }
+  def planGraphCount(): Long = {
--- End diff --

nit: add empty line before


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-15 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/20767
  
@koeninger good question Cody! I think we should fix this limitation 
eventually. The only reason I am not doing that in this PR is to keep the 
changes minimum for backporting to 2.3.x. Eventually, we should not do such 
cache management, and rather use something like [Apache Common 
Pool](https://commons.apache.org/proper/commons-pool/index.html).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20839
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20839
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88281/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20839
  
**[Test build #88281 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88281/testReport)**
 for PR 20839 at commit 
[`5caf63c`](https://github.com/apache/spark/commit/5caf63cc32a7546823e64d774faee9fb63a6b286).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20807#discussion_r174964951
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -417,8 +417,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 }
   }
 
-  private def sparkContextInitialized(sc: SparkContext) = {
+  private def sparkContextInitialized(sc: SparkContext) = synchronized {
--- End diff --

Some other code in this class uses synchronization on `this`, so I think it 
would be better to synchronize on `sparkContextPromise` in this case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20807#discussion_r174965130
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -497,6 +500,8 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 // if the user app did not create a SparkContext.
 throw new IllegalStateException("User did not initialize spark 
context!")
   }
+  // After initialisation notify user class thread to continue
+  synchronized { notify() }
--- End diff --

Since you have to do this in two places, I'd create a method (e.g. 
`resumeDriver`) close to where `sparkContextInitialized` is declared, so that 
it's easier to find the context of why this is needed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20807#discussion_r174965182
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -497,6 +500,8 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 // if the user app did not create a SparkContext.
 throw new IllegalStateException("User did not initialize spark 
context!")
   }
+  // After initialisation notify user class thread to continue
--- End diff --

nit: rest of the code uses American spelling ("initialization"), so this 
should be consistent.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20829
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20829
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88278/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20829
  
**[Test build #88278 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88278/testReport)**
 for PR 20829 at commit 
[`5ce7671`](https://github.com/apache/spark/commit/5ce7671267a4dd240b80b6230e9e3729c96de7e6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20839
  
**[Test build #88281 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88281/testReport)**
 for PR 20839 at commit 
[`5caf63c`](https://github.com/apache/spark/commit/5caf63cc32a7546823e64d774faee9fb63a6b286).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20839
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20839: [SPARK-23699][PYTHON][SQL] Raise same type of error caug...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20839
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1543/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20838
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20838
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20839: [SPARK-23699][PYTHON][SQL] Raise same type of err...

2018-03-15 Thread BryanCutler

GitHub user BryanCutler opened a pull request:

https://github.com/apache/spark/pull/20839

[SPARK-23699][PYTHON][SQL] Raise same type of error caught with Arrow 
enabled

## What changes were proposed in this pull request?

When using Arrow for createDataFrame or toPandas and an error is 
encountered with fallback disabled, this will raise the same type of error 
instead of a RuntimeError.

## How was this patch tested?

Updated existing tests to verify error type.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BryanCutler/spark 
arrow-raise-same-error-SPARK-23699

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20839.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20839


commit 5caf63cc32a7546823e64d774faee9fb63a6b286
Author: Bryan Cutler 
Date:   2018-03-15T23:07:27Z

raise same type of error when not falling back




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-03-15 Thread cclauss

GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/20838

[SPARK-23698] Resolve undefined names in Python 3

## What changes were proposed in this pull request?

Fix issues arising from the fact that __file__, __long__, __raw_input()__, 
__unicode__, __xrange()__, etc. were remove from Python 3.  __Undefined names__ 
have the potential to raise NameError at runtime.  Apply the Python porting 
best practice: [Use feature detection instead of version 
detection](https://docs.python.org/3/howto/pyporting.html#use-feature-detection-instead-of-version-detection).
  

## How was this patch tested?
* $ __python2 -m flake8 . --count --select=E9,F82 --show-source 
--statistics__
* $ __python3 -m flake8 . --count --select=E9,F82 --show-source 
--statistics__

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark fix-undefined-names

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20838.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20838


commit bdcd740a5144efef0db1f2b6c29b64c85f3d8ef5
Author: cclauss 
Date:   2018-03-15T23:08:04Z

[SPARK-23698] Reduce undefined names in Python 3




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20687
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88280/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20687
  
**[Test build #88280 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88280/testReport)**
 for PR 20687 at commit 
[`8adaa47`](https://github.com/apache/spark/commit/8adaa4748b7b0d7990b45e58d82c7d9f2248923f).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20687
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20687
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1542/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20687
  
**[Test build #88280 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88280/testReport)**
 for PR 20687 at commit 
[`8adaa47`](https://github.com/apache/spark/commit/8adaa4748b7b0d7990b45e58d82c7d9f2248923f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20687
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20744: [SPARK-23608][CORE][WebUI] Add synchronization in SHS be...

2018-03-15 Thread zhouyejoe

Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/20744
  
It is weird that the PySpark unit tests failed, I don't think it is related.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20829
  
**[Test build #88278 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88278/testReport)**
 for PR 20829 at commit 
[`5ce7671`](https://github.com/apache/spark/commit/5ce7671267a4dd240b80b6230e9e3729c96de7e6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20579
  
**[Test build #88279 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88279/testReport)**
 for PR 20579 at commit 
[`3588af3`](https://github.com/apache/spark/commit/3588af39eb889ab72f6800b546ba9f2107f15dc0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1541/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-15 Thread yogeshg

Github user yogeshg commented on the issue:

https://github.com/apache/spark/pull/20829
  
I fixed code paths that failed tests, waiting for @SparkQA . Offline talk 
with @MrBago suggests that we can perhaps decrease the number of maps in 
`transform` method. Looking into that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20831: [SPARK-23614][SQL] Fix incorrect reuse exchange when cac...

2018-03-15 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20831
  
cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to Python ...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20777
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to Python ...

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20777
  
**[Test build #88276 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88276/testReport)**
 for PR 20777 at commit 
[`d6cd73a`](https://github.com/apache/spark/commit/d6cd73aacd0683a96a6250d6ab0f9cb9b5eca6f7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to Python ...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20777
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88276/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20829
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88277/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20829
  
**[Test build #88277 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88277/testReport)**
 for PR 20829 at commit 
[`bf2f5b3`](https://github.com/apache/spark/commit/bf2f5b39b3056b2301afd1e19582fea0e3700f3a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20829
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20829
  
**[Test build #88277 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88277/testReport)**
 for PR 20829 at commit 
[`bf2f5b3`](https://github.com/apache/spark/commit/bf2f5b39b3056b2301afd1e19582fea0e3700f3a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-15 Thread yogeshg

Github user yogeshg commented on a diff in the pull request:

https://github.com/apache/spark/pull/20829#discussion_r174941546
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -234,7 +234,7 @@ class StringIndexerModel (
 val metadata = NominalAttribute.defaultAttr
   .withName($(outputCol)).withValues(filteredLabels).toMetadata()
 // If we are skipping invalid records, filter them out.
-val (filteredDataset, keepInvalid) = getHandleInvalid match {
--- End diff --

Thanks for picking this out! I changed this because I was matching on 
`$(handleInvalid)` in VectorAssembler and that seems to be the recommended way 
of doing this. Should I include this in the current PR and add a note or open a 
separate PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to Python ...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20777
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to Python ...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20777
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1540/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to Python ...

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20777
  
**[Test build #88276 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88276/testReport)**
 for PR 20777 at commit 
[`d6cd73a`](https://github.com/apache/spark/commit/d6cd73aacd0683a96a6250d6ab0f9cb9b5eca6f7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20799: [SPARK-23635][YARN] AM env variable should not overwrite...

2018-03-15 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/20799
  
Thanks for the changes @jerryshao, they look good to me.
Do you have any other thoughts/comments @tgravescs ? Since this is a 
behavior change, I want to make sure we are not missing anything !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to ...

2018-03-15 Thread huaxingao

Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20777#discussion_r174935305
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -465,26 +473,26 @@ class CountVectorizer(JavaEstimator, HasInputCol, 
HasOutputCol, JavaMLReadable,
 " Default False", typeConverter=TypeConverters.toBoolean)
 
 @keyword_only
-def __init__(self, minTF=1.0, minDF=1.0, vocabSize=1 << 18, 
binary=False, inputCol=None,
- outputCol=None):
+def __init__(self, minTF=1.0, minDF=1.0, maxDF=sys.maxsize, 
vocabSize=1 << 18, binary=False,
--- End diff --

Will make the change now. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20828
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88271/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20828
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20611
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20828
  
**[Test build #88271 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88271/testReport)**
 for PR 20828 at commit 
[`7c8d30e`](https://github.com/apache/spark/commit/7c8d30e00ca7497cd4ed621fc0f5dacc807efc04).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20611
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88272/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20611
  
**[Test build #88272 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88272/testReport)**
 for PR 20611 at commit 
[`8341465`](https://github.com/apache/spark/commit/8341465594f9316f9f52678fd70faa7f910edfcc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88273/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20579
  
**[Test build #88273 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88273/testReport)**
 for PR 20579 at commit 
[`ad15411`](https://github.com/apache/spark/commit/ad154115e520b82eec0b252fa19e66abdc1da832).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20786: [SPARK-14681][ML] Provide label/impurity stats fo...

2018-03-15 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20786#discussion_r174921341
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala ---
@@ -84,35 +86,73 @@ private[ml] object Node {
   /**
* Create a new Node from the old Node format, recursively creating 
child nodes as needed.
*/
-  def fromOld(oldNode: OldNode, categoricalFeatures: Map[Int, Int]): Node 
= {
+  def fromOld(oldNode: OldNode, categoricalFeatures: Map[Int, Int],
+isClassification: Boolean): Node = {
 if (oldNode.isLeaf) {
   // TODO: Once the implementation has been moved to this API, then 
include sufficient
   //   statistics here.
-  new LeafNode(prediction = oldNode.predict.predict,
-impurity = oldNode.impurity, impurityStats = null)
+  if (isClassification) {
+new ClassificationLeafNode(prediction = oldNode.predict.predict,
+  impurity = oldNode.impurity, impurityStats = null)
+  } else {
+new RegressionLeafNode(prediction = oldNode.predict.predict,
+  impurity = oldNode.impurity, impurityStats = null)
+  }
 } else {
   val gain = if (oldNode.stats.nonEmpty) {
 oldNode.stats.get.gain
   } else {
 0.0
   }
-  new InternalNode(prediction = oldNode.predict.predict, impurity = 
oldNode.impurity,
-gain = gain, leftChild = fromOld(oldNode.leftNode.get, 
categoricalFeatures),
-rightChild = fromOld(oldNode.rightNode.get, categoricalFeatures),
-split = Split.fromOld(oldNode.split.get, categoricalFeatures), 
impurityStats = null)
+  if (isClassification) {
+new ClassificationInternalNode(prediction = 
oldNode.predict.predict,
+  impurity = oldNode.impurity, gain = gain,
+  leftChild = fromOld(oldNode.leftNode.get, categoricalFeatures, 
true)
+.asInstanceOf[ClassificationNode],
+  rightChild = fromOld(oldNode.rightNode.get, categoricalFeatures, 
true)
+.asInstanceOf[ClassificationNode],
+  split = Split.fromOld(oldNode.split.get, categoricalFeatures), 
impurityStats = null)
+  } else {
+new RegressionInternalNode(prediction = oldNode.predict.predict,
+  impurity = oldNode.impurity, gain = gain,
+  leftChild = fromOld(oldNode.leftNode.get, categoricalFeatures, 
false)
+.asInstanceOf[RegressionNode],
+  rightChild = fromOld(oldNode.rightNode.get, categoricalFeatures, 
false)
+.asInstanceOf[RegressionNode],
+  split = Split.fromOld(oldNode.split.get, categoricalFeatures), 
impurityStats = null)
+  }
 }
   }
 }
 
-/**
- * Decision tree leaf node.
- * @param prediction  Prediction this node makes
- * @param impurity  Impurity measure at this node (for training data)
- */
-class LeafNode private[ml] (
--- End diff --

While it would be nice to have this be a trait instead of a class, I am 
worried about breaking public APIs.  However, one could argue that this isn't a 
public API since the constructor is private (though the class is public).  I'll 
CC people on https://issues.apache.org/jira/browse/SPARK-7131 where these 
classes were made public to get feedback.  Let's give a couple of days for 
feedback before proceeding.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20750: [SPARK-23581][SQL] Add interpreted unsafe projection

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20750
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88270/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20750: [SPARK-23581][SQL] Add interpreted unsafe projection

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20750
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20750: [SPARK-23581][SQL] Add interpreted unsafe projection

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20750
  
**[Test build #88270 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88270/testReport)**
 for PR 20750 at commit 
[`7e15f30`](https://github.com/apache/spark/commit/7e15f3061b5fbd48344ce36e773766a83b2bf410).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20659
  
**[Test build #88275 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88275/testReport)**
 for PR 20659 at commit 
[`e041704`](https://github.com/apache/spark/commit/e041704c8898234299b8af1d85b09c1acc2532ab).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1539/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20659
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-15 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20659
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to ...

2018-03-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20777#discussion_r174911085
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -70,19 +70,21 @@ private[feature] trait CountVectorizerParams extends 
Params with HasInputCol wit
   def getMinDF: Double = $(minDF)
 
   /**
-   * Specifies the maximum number of different documents a term must 
appear in to be included
-   * in the vocabulary.
-   * If this is an integer greater than or equal to 1, this specifies the 
number of documents
-   * the term must appear in; if this is a double in [0,1), then this 
specifies the fraction of
-   * documents.
+   * Specifies the maximum number of different documents a term could 
appear in to be included
+   * in the vocabulary. A term that appears more than the threshold will 
be ignored. If this is an
+   * integer greater than or equal to 1, this specifies the maximum number 
of documents the term
+   * could appear in; if this is a double in [0,1), then this specifies 
the maximum fraction of
+   * documents the term could appear in.
--- End diff --

Agree, your wording is clearer.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20837: [SPARK-23686][ML][WIP] Better instrumentation

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20837
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88274/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20837: [SPARK-23686][ML][WIP] Better instrumentation

2018-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20837
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 >

101 - 200 of 376 matches

Mail list logo