[GitHub] spark pull request #20201: [SPARK-22389][SQL] data source v2 partitioning re...

2018-01-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20201#discussion_r162733684
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Partitioning.java 
---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader;
+
+import org.apache.spark.annotation.InterfaceStability;
+
+/**
+ * An interface to represent output data partitioning for a data source, 
which is returned by
+ * {@link SupportsReportPartitioning#outputPartitioning()}. Note that this 
should work like a
+ * snapshot, once created, it should be deterministic and always report 
same number of partitions
--- End diff --

`, once` -> `. Once`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20201: [SPARK-22389][SQL] data source v2 partitioning re...

2018-01-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20201#discussion_r162733629
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Partitioning.java 
---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader;
+
+import org.apache.spark.annotation.InterfaceStability;
+
+/**
+ * An interface to represent output data partitioning for a data source, 
which is returned by
--- End diff --

`output` -> `the output`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20201: [SPARK-22389][SQL] data source v2 partitioning re...

2018-01-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20201#discussion_r162733463
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Distribution.java 
---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader;
+
+import org.apache.spark.annotation.InterfaceStability;
+
+/**
+ * An interface to represent data distribution requirement, which 
specifies how the records should
+ * be distributed among the {@link ReadTask}s that are returned by
+ * {@link DataSourceV2Reader#createReadTasks()}. Note that this interface 
has nothing to do with
+ * the data ordering inside one partition(the output records of a single 
{@link ReadTask}).
+ *
+ * The instance of this interface is created and provided by Spark, then 
consumed by
+ * {@link Partitioning#satisfy(Distribution)}. This means users don't need 
to implement
--- End diff --

`users ` -> `data source developers`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20201: [SPARK-22389][SQL] data source v2 partitioning re...

2018-01-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20201#discussion_r162733351
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourcePartitioning.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap}
+import org.apache.spark.sql.catalyst.plans.physical
+import org.apache.spark.sql.sources.v2.reader.{ClusteredDistribution, 
Partitioning}
+
+/**
+ * An adapter from public data source partitioning to catalyst internal 
partitioning.
--- End diff --

`Partitioning `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20299: [SPARK-23135][ui] Fix rendering of accumulators in the s...

2018-01-19 Thread sameeragarwal
Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/20299
  
LGTM. Merging this to master/2.3. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20201: [SPARK-22389][SQL] data source v2 partitioning re...

2018-01-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20201#discussion_r162733141
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java
 ---
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader;
+
+import org.apache.spark.annotation.InterfaceStability;
+
+/**
+ * A concrete implementation of {@link Distribution}. Represents a 
distribution where records that
+ * share the same values for the {@link #clusteredColumns} will be 
produced by the same
+ * {@link ReadTask}.
+ */
+@InterfaceStability.Evolving
+public class ClusteredDistribution implements Distribution {
+  public String[] clusteredColumns;
--- End diff --

Need to emphasize these columns are order insensitive.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20201: [SPARK-22389][SQL] data source v2 partitioning re...

2018-01-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20201#discussion_r162732939
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala 
---
@@ -95,6 +96,34 @@ class DataSourceV2Suite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("partitioning reporting") {
+import org.apache.spark.sql.functions.{count, sum}
+Seq(classOf[PartitionAwareDataSource], 
classOf[JavaPartitionAwareDataSource]).foreach { cls =>
+  withClue(cls.getName) {
+val df = spark.read.format(cls.getName).load()
+checkAnswer(df, Seq(Row(1, 4), Row(1, 4), Row(3, 6), Row(2, 6), 
Row(4, 2), Row(4, 2)))
+
+val groupByColA = df.groupBy('a).agg(sum('b))
+checkAnswer(groupByColA, Seq(Row(1, 8), Row(2, 6), Row(3, 6), 
Row(4, 4)))
+assert(groupByColA.queryExecution.executedPlan.collectFirst {
+  case e: ShuffleExchangeExec => e
+}.isEmpty)
+
+val groupByColAB = df.groupBy('a, 'b).agg(count("*"))
--- End diff --

Try `df.groupBy('a + 'b).agg(count("*")).show()` 

At least, it should not fail, even if we do not support complex 
`ClusteredDistribution` expressions


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20203
  
**[Test build #86400 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86400/testReport)**
 for PR 20203 at commit 
[`cf6e0c9`](https://github.com/apache/spark/commit/cf6e0c919e151c26772ec78a10abc6d2454f7dd5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20091: [SPARK-22465][FOLLOWUP] Update the number of partitions ...

2018-01-19 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20091
  
@mridulm Thank you!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20091: [SPARK-22465][FOLLOWUP] Update the number of partitions ...

2018-01-19 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/20091
  
@jiangxb1987 Thanks for clarifying, looks good to me - I will merge it 
later today evening (assuming someone else does not before :) )


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-01-19 Thread smurakozi
Github user smurakozi commented on the issue:

https://github.com/apache/spark/pull/20319
  
@jkbradley could you check out this change, please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20235: [Spark-22887][ML][TESTS][WIP] ML test for StructuredStre...

2018-01-19 Thread smurakozi
Github user smurakozi commented on the issue:

https://github.com/apache/spark/pull/20235
  
@jkbradley could you check out this change, please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...

2018-01-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20330#discussion_r162721177
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
@@ -427,23 +435,21 @@ private[ui] class JobDataSource(
 val formattedDuration = duration.map(d => 
UIUtils.formatDuration(d)).getOrElse("Unknown")
 val submissionTime = jobData.submissionTime
 val formattedSubmissionTime = 
submissionTime.map(UIUtils.formatDate).getOrElse("Unknown")
-val lastStageAttempt = store.lastStageAttempt(jobData.stageIds.max)
-val lastStageDescription = lastStageAttempt.description.getOrElse("")
+val (lastStageName, lastStageDescription) = 
lastStageNameAndDescription(store, jobData)
 
-val formattedJobDescription =
-  UIUtils.makeDescription(lastStageDescription, basePath, plainText = 
false)
+val jobDescription = UIUtils.makeDescription(lastStageDescription, 
basePath, plainText = false)
--- End diff --

Sure, but don't you want the same behavior as above here (falling back to 
the job name)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20330
  
**[Test build #86399 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86399/testReport)**
 for PR 20330 at commit 
[`d5fdabb`](https://github.com/apache/spark/commit/d5fdabb678f4df7c101d8660cb7c37086e35489a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2018-01-19 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19993#discussion_r162719519
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
@@ -201,9 +184,13 @@ final class Bucketizer @Since("1.4.0") 
(@Since("1.4.0") override val uid: String
 
   @Since("1.4.0")
   override def transformSchema(schema: StructType): StructType = {
-if (isBucketizeMultipleColumns()) {
+ParamValidators.checkExclusiveParams(this, "inputCol", "inputCols")
--- End diff --

my initial implementation (with @hhbyyh's comments) was more generic and 
checked what you said. After, @MLnick and @viirya asked to switch to a more 
generic approach which is the current you see. I'm fine with either of those, 
but I think we need to choose one way and go in that direction, otherwise we 
just loose time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-19 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19993
  
@jkbradley sure no problem, let me know how I can help.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2018-01-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/19993#discussion_r162717263
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -166,6 +167,8 @@ private[ml] object Param {
 @DeveloperApi
 object ParamValidators {
 
+  private val LOGGER = LoggerFactory.getLogger(ParamValidators.getClass)
--- End diff --

Let's switch this to use the Logging trait, to match other MLlib patterns.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2018-01-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/19993#discussion_r162717142
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
@@ -201,9 +184,13 @@ final class Bucketizer @Since("1.4.0") 
(@Since("1.4.0") override val uid: String
 
   @Since("1.4.0")
   override def transformSchema(schema: StructType): StructType = {
-if (isBucketizeMultipleColumns()) {
+ParamValidators.checkExclusiveParams(this, "inputCol", "inputCols")
--- End diff --

The problem with trying to use a general method like this is that it's hard 
to capture model-specific requirements.  This currently misses checking to make 
sure that exactly one (not just <= 1) of each pair is available, plus that all 
of the single-column OR all of the multi-column Params are available.  (The 
same issue occurs in https://github.com/apache/spark/pull/20146 )  It will also 
be hard to check these items and account for defaults.

I'd argue that it's not worth trying to use generic checking functions here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...

2018-01-19 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20203
  
btw another way you could test out having a bad host would be something 
like this (untested):

```scala
import org.apache.spark.SparkEnv

val hosts = sc.parallelize(1 to 1, 100).map { _ => 
InetAddress.getHostName()}.collect().toSet
val badHost = hosts.head

sc.parallelize(1 to 1, 10).map { x =>
  if (InetAddress.getHostName() == badHost) throw new RuntimeException("Bad 
host")
else (x % 3, x)
}.reduceByKey((a, b) => a + b).collect()
```

that way you make sure the failures are consistently on one host, not 
dependent on higher executor ids getting concentrated on one host.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20203: [SPARK-22577] [core] executor page blacklist stat...

2018-01-19 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/20203#discussion_r162716271
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetBlacklistSuite.scala ---
@@ -59,31 +60,55 @@ class TaskSetBlacklistSuite extends SparkFunSuite with 
BeforeAndAfterEach with M
   val shouldBeBlacklisted = (executor == "exec1" && index == 0)
   assert(taskSetBlacklist.isExecutorBlacklistedForTask(executor, 
index) === shouldBeBlacklisted)
 }
+
 assert(!taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec1"))
+verify(listenerBusMock, never())
+  .post(isA(classOf[SparkListenerExecutorBlacklistedForStage]))
+
 assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA"))
+verify(listenerBusMock, never())
+  .post(isA(classOf[SparkListenerNodeBlacklistedForStage]))
 
 // Mark task 1 failed on exec1 -- this pushes the executor into the 
blacklist
 taskSetBlacklist.updateBlacklistForFailedTask(
   "hostA", exec = "exec1", index = 1, failureReason = "testing")
+
 assert(taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec1"))
-assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA"))
 verify(listenerBusMock).post(
   SparkListenerExecutorBlacklistedForStage(0, "exec1", 2, 0, 
attemptId))
+
+assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA"))
+verify(listenerBusMock, never())
+  .post(isA(classOf[SparkListenerNodeBlacklistedForStage]))
+
 // Mark one task as failed on exec2 -- not enough for any further 
blacklisting yet.
 taskSetBlacklist.updateBlacklistForFailedTask(
   "hostA", exec = "exec2", index = 0, failureReason = "testing")
 assert(taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec1"))
+
 assert(!taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec2"))
+verify(listenerBusMock, never()).post(
+  SparkListenerNodeBlacklistedForStage(0, "hostA", 2, 0, attemptId))
+
 assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA"))
+verify(listenerBusMock, never())
+  .post(isA(classOf[SparkListenerNodeBlacklistedForStage]))
--- End diff --

yes, you are right


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20146: [SPARK-11215][ML] Add multiple columns support to...

2018-01-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20146#discussion_r162715788
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -249,6 +249,16 @@ object ParamValidators {
   def arrayLengthGt[T](lowerBound: Double): Array[T] => Boolean = { 
(value: Array[T]) =>
 value.length > lowerBound
   }
+
+  /** Check if more than one param in a set of exclusive params are set. */
+  def checkExclusiveParams(model: Params, params: String*): Unit = {
+if (params.filter(paramName => model.hasParam(paramName) &&
--- End diff --

Why is this checking to see if the Param belongs to the Model?  If this 
method is called with irrelevant Params, shouldn't it throw an error?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20203: [SPARK-22577] [core] executor page blacklist stat...

2018-01-19 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20203#discussion_r162714257
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetBlacklistSuite.scala ---
@@ -59,31 +60,55 @@ class TaskSetBlacklistSuite extends SparkFunSuite with 
BeforeAndAfterEach with M
   val shouldBeBlacklisted = (executor == "exec1" && index == 0)
   assert(taskSetBlacklist.isExecutorBlacklistedForTask(executor, 
index) === shouldBeBlacklisted)
 }
+
 assert(!taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec1"))
+verify(listenerBusMock, never())
+  .post(isA(classOf[SparkListenerExecutorBlacklistedForStage]))
+
 assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA"))
+verify(listenerBusMock, never())
+  .post(isA(classOf[SparkListenerNodeBlacklistedForStage]))
 
 // Mark task 1 failed on exec1 -- this pushes the executor into the 
blacklist
 taskSetBlacklist.updateBlacklistForFailedTask(
   "hostA", exec = "exec1", index = 1, failureReason = "testing")
+
 assert(taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec1"))
-assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA"))
 verify(listenerBusMock).post(
   SparkListenerExecutorBlacklistedForStage(0, "exec1", 2, 0, 
attemptId))
+
+assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA"))
+verify(listenerBusMock, never())
+  .post(isA(classOf[SparkListenerNodeBlacklistedForStage]))
+
 // Mark one task as failed on exec2 -- not enough for any further 
blacklisting yet.
 taskSetBlacklist.updateBlacklistForFailedTask(
   "hostA", exec = "exec2", index = 0, failureReason = "testing")
 assert(taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec1"))
+
 assert(!taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec2"))
+verify(listenerBusMock, never()).post(
+  SparkListenerNodeBlacklistedForStage(0, "hostA", 2, 0, attemptId))
+
 assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA"))
+verify(listenerBusMock, never())
+  .post(isA(classOf[SparkListenerNodeBlacklistedForStage]))
--- End diff --

the `verify` you add just above this is pointless with this one too, right? 
 I think you only need this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...

2018-01-19 Thread attilapiros
Github user attilapiros commented on the issue:

https://github.com/apache/spark/pull/20203
  
The node blacklisting is tested by unit tests:
- HistoryServerSuite
- TaskSetBlacklistSuite
- AppStatusListenerSuite

And manually with a 2 node cluster: 
https://issues.apache.org/jira/secure/attachment/12906833/node_blacklisting_for_stage.png

Here you can see apiros3.gce.test.com was node blacklisted for the stage 
because of failures on executor 4 and 5. As expected executor 3 is also 
blacklisted even it has no failures itself but sharing the node with 4 and 5.

Spark was started as:

``` bash
./bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G 
--num-executors=8 --conf "spark.blacklist.enabled=true" --conf 
"spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf 
"spark.blacklist.stage.maxFailedExecutorsPerNode=1"  --conf 
"spark.blacklist.application.maxFailedTasksPerExecutor=10" --conf 
"spark.eventLog.enabled=true"
```

And the job was:

``` scala
import org.apache.spark.SparkEnv

sc.parallelize(1 to 1, 10).map { x =>
  if (SparkEnv.get.executorId.toInt >= 4) throw new RuntimeException("Bad 
executor")
else (x % 3, x)
}.reduceByKey((a, b) => a + b).collect()
```




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20332
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20332
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86397/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20332
  
**[Test build #86397 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86397/testReport)**
 for PR 20332 at commit 
[`58d973e`](https://github.com/apache/spark/commit/58d973e204bd62128567fd3dfb2e5a335ac46bf1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20284: [SPARK-23103][core] Ensure correct sort order for...

2018-01-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20284


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...

2018-01-19 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20330#discussion_r162712767
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
@@ -427,23 +435,21 @@ private[ui] class JobDataSource(
 val formattedDuration = duration.map(d => 
UIUtils.formatDuration(d)).getOrElse("Unknown")
 val submissionTime = jobData.submissionTime
 val formattedSubmissionTime = 
submissionTime.map(UIUtils.formatDate).getOrElse("Unknown")
-val lastStageAttempt = store.lastStageAttempt(jobData.stageIds.max)
-val lastStageDescription = lastStageAttempt.description.getOrElse("")
+val (lastStageName, lastStageDescription) = 
lastStageNameAndDescription(store, jobData)
 
-val formattedJobDescription =
-  UIUtils.makeDescription(lastStageDescription, basePath, plainText = 
false)
+val jobDescription = UIUtils.makeDescription(lastStageDescription, 
basePath, plainText = false)
--- End diff --

`lastStageDescription` may be empty, but it will not cause problems, 
`makeDescription` will handle it properly, just like in the version before 
lastStageAttempt was used:
```
  val jobDescription = 
UIUtils.makeDescription(jobData.description.getOrElse(""), 
  basePath, plainText = false)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20203
  
**[Test build #86398 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86398/testReport)**
 for PR 20203 at commit 
[`41dd7bb`](https://github.com/apache/spark/commit/41dd7bbc1f62e093738e730bf3f5bfeb3dff16fb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20284: [SPARK-23103][core] Ensure correct sort order for negati...

2018-01-19 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20284
  
even though we don't *know* of this causing a bug in 2.3, I still think we 
should merge it in there just because there may be some case we aren't thinking 
of, and this is a relatively small, safe fix.

so, I'm merging to master & 2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20138: [SPARK-20664][core] Delete stale application data...

2018-01-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20138


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20331
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20138: [SPARK-20664][core] Delete stale application data from S...

2018-01-19 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20138
  
as RC1 failed and RC2 is going to be cut soon, I'm going to merge this to 
master & 2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20331
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86393/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20331
  
**[Test build #86393 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86393/testReport)**
 for PR 20331 at commit 
[`f7693f0`](https://github.com/apache/spark/commit/f7693f0abfe0923868c1918ddcaeaece2c107c5d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class HadoopFsRelationTest extends QueryTest with 
SQLTestUtils with SharedSQLContext `
  * `class JsonHadoopFsRelationSuite extends HadoopFsRelationTest `
  * `class OrcHadoopFsRelationSuite extends HadoopFsRelationTest `
  * `class ParquetHadoopFsRelationSuite extends HadoopFsRelationTest `
  * `class SimpleTextHadoopFsRelationSuite extends HadoopFsRelationTest 
with PredicateHelper `
  * `class SimpleTextSource extends TextBasedFileFormat with 
DataSourceRegister `
  * `class SimpleTextOutputWriter(path: String, dataSchema: StructType, 
context: TaskAttemptContext)`
  * `class HiveOrcHadoopFsRelationSuite extends OrcHadoopFsRelationSuite `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20332
  
**[Test build #86397 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86397/testReport)**
 for PR 20332 at commit 
[`58d973e`](https://github.com/apache/spark/commit/58d973e204bd62128567fd3dfb2e5a335ac46bf1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20332
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/48/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20332
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20331
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20331
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86392/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20331
  
**[Test build #86392 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86392/testReport)**
 for PR 20331 at commit 
[`f7693f0`](https://github.com/apache/spark/commit/f7693f0abfe0923868c1918ddcaeaece2c107c5d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class HadoopFsRelationTest extends QueryTest with 
SQLTestUtils with SharedSQLContext `
  * `class JsonHadoopFsRelationSuite extends HadoopFsRelationTest `
  * `class OrcHadoopFsRelationSuite extends HadoopFsRelationTest `
  * `class ParquetHadoopFsRelationSuite extends HadoopFsRelationTest `
  * `class SimpleTextHadoopFsRelationSuite extends HadoopFsRelationTest 
with PredicateHelper `
  * `class SimpleTextSource extends TextBasedFileFormat with 
DataSourceRegister `
  * `class SimpleTextOutputWriter(path: String, dataSchema: StructType, 
context: TaskAttemptContext)`
  * `class HiveOrcHadoopFsRelationSuite extends OrcHadoopFsRelationSuite `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2018-01-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/17123
  
But, pls resolve conflicts first. :) Bucketizer add multiple column support 
so the code is different now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-19 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/19993
  
Since RC1 for 2.3 failed, it'd be great to get this into 2.3.  @mgaido91 do 
you mind if I send my comments along with a PR to update this PR of yours?  I'm 
rushing because of the time pressure to get this into 2.3 (to avoid a behavior 
change between 2.3 and 2.4).  Thanks in advance!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20332
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20332
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/47/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20332
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86396/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20332
  
**[Test build #86396 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86396/testReport)**
 for PR 20332 at commit 
[`cb6c811`](https://github.com/apache/spark/commit/cb6c811e98d9739a7c1608880b2d0037cdeb5990).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20332
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2018-01-19 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17123#discussion_r162703711
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
@@ -171,23 +176,23 @@ object Bucketizer extends 
DefaultParamsReadable[Bucketizer] {
* Binary searching in several buckets to place each data point.
* @param splits array of split points
* @param feature data point
-   * @param keepInvalid NaN flag.
-   *Set "true" to make an extra bucket for NaN values;
-   *Set "false" to report an error for NaN values
+   * @param keepInvalid NaN/NULL flag.
+   *Set "true" to make an extra bucket for NaN/NULL 
values;
+   *Set "false" to report an error for NaN/NULL values
* @return bucket for each data point
* @throws SparkException if a feature is < splits.head or > splits.last
*/
 
   private[feature] def binarySearchForBuckets(
   splits: Array[Double],
-  feature: Double,
+  feature: java.lang.Double,
--- End diff --

Also change to `Option[Double]` here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2018-01-19 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17123#discussion_r162703633
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
@@ -105,20 +106,21 @@ final class Bucketizer @Since("1.4.0") 
(@Since("1.4.0") override val uid: String
 transformSchema(dataset.schema)
 val (filteredDataset, keepInvalid) = {
   if (getHandleInvalid == Bucketizer.SKIP_INVALID) {
-// "skip" NaN option is set, will filter out NaN values in the 
dataset
+// "skip" NaN/NULL option is set, will filter out NaN/NULL values 
in the dataset
 (dataset.na.drop().toDF(), false)
   } else {
 (dataset.toDF(), getHandleInvalid == Bucketizer.KEEP_INVALID)
   }
 }
 
-val bucketizer: UserDefinedFunction = udf { (feature: Double) =>
--- End diff --

As @cloud-fan suggested, `Option[Double]` is better. :-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass summary example and us...

2018-01-19 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/20332
  
@jkbradley @MLnick 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass summary example and us...

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20332
  
**[Test build #86396 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86396/testReport)**
 for PR 20332 at commit 
[`cb6c811`](https://github.com/apache/spark/commit/cb6c811e98d9739a7c1608880b2d0037cdeb5990).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass summary example...

2018-01-19 Thread sethah
GitHub user sethah opened a pull request:

https://github.com/apache/spark/pull/20332

[SPARK-23138][ML][DOC] Multiclass summary example and user guide

## What changes were proposed in this pull request?

User guide and examples are updated to reflect multiclass logistic 
regression summary which was added in 
[SPARK-17139](https://issues.apache.org/jira/browse/SPARK-17139).

I did not make a separate summary example, but added the summary code to 
the multiclass example that already existed. I don't see the need for a 
separate example for the summary. 

## How was this patch tested?

Docs and examples only. Ran all examples locally using spark-submit.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sethah/spark multiclass_summary_example

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20332.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20332


commit 9299fc83d2edab956bd13b2e1c985f64dcd2643e
Author: sethah 
Date:   2018-01-19T17:52:10Z

adding examples for python, scala, and java

commit bf076ed09abb3bb474e0925b3b9c4dbc6e90771a
Author: sethah 
Date:   2018-01-19T18:43:01Z

use binaryTrainingSummary

commit d0aa9f19550deb620e515ec33004be365c5439be
Author: sethah 
Date:   2018-01-19T18:46:16Z

import cleanup

commit cb6c811e98d9739a7c1608880b2d0037cdeb5990
Author: sethah 
Date:   2018-01-19T18:51:28Z

clarify user guide




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18983: [SPARK-21771][SQL]remove useless hive client in SparkSQL...

2018-01-19 Thread liufengdb
Github user liufengdb commented on the issue:

https://github.com/apache/spark/pull/18983
  
LGTM! It is only created once though.

Frankly, we should completely remove the implementation of `newSession()` 
method.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20091: [SPARK-22465][FOLLOWUP] Update the number of partitions ...

2018-01-19 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20091
  
@mridulm Great write up! Yeah it's exactly that you described, and I've 
copied them to the PR description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20025: [SPARK-22837][SQL]Session timeout checker does no...

2018-01-19 Thread liufengdb
Github user liufengdb commented on a diff in the pull request:

https://github.com/apache/spark/pull/20025#discussion_r162698093
  
--- Diff: 
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/SessionManager.java
 ---
@@ -80,7 +76,6 @@ public synchronized void init(HiveConf hiveConf) {
 }
 createBackgroundOperationPool();
 addService(operationManager);
-super.init(hiveConf);
--- End diff --

hmm, I think we revert keep this line too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17894
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17894
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/46/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...

2018-01-19 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20328
  
An late LGTM! :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20185: Branch 2.3

2018-01-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20185


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20314: [SPARK-23104][K8S][Docs] Changes to Kubernetes sc...

2018-01-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20314


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20297
  
**[Test build #86395 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86395/testReport)**
 for PR 20297 at commit 
[`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20314: [SPARK-23104][K8S][Docs] Changes to Kubernetes scheduler...

2018-01-19 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20314
  
Merging to master / 2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20297
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/45/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20297
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20297
  
**[Test build #4067 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4067/testReport)**
 for PR 20297 at commit 
[`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

2018-01-19 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20297
  
I kicked an extra couple of builds aside from the one that should 
auto-trigger.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20297
  
**[Test build #4066 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4066/testReport)**
 for PR 20297 at commit 
[`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

2018-01-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20297#discussion_r162694343
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java ---
@@ -331,23 +358,27 @@ protected void handle(Message msg) throws IOException 
{
   timeout.cancel();
 }
 close();
+if (handle != null) {
+  handle.dispose();
+}
   } finally {
 timeoutTimer.purge();
   }
 }
 
 @Override
 public void close() throws IOException {
+  if (!isOpen()) {
+return;
+  }
+
   synchronized (clients) {
 clients.remove(this);
   }
-  super.close();
-  if (handle != null) {
-if (!handle.getState().isFinal()) {
-  LOG.log(Level.WARNING, "Lost connection to spark application.");
-  handle.setState(SparkAppHandle.State.LOST);
-}
-handle.disconnect();
--- End diff --

See https://github.com/apache/spark/pull/20297#pullrequestreview-89568079


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

2018-01-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20297#discussion_r162694174
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java ---
@@ -331,23 +358,27 @@ protected void handle(Message msg) throws IOException 
{
   timeout.cancel();
 }
 close();
+if (handle != null) {
+  handle.dispose();
+}
   } finally {
 timeoutTimer.purge();
   }
 }
 
 @Override
 public void close() throws IOException {
+  if (!isOpen()) {
+return;
+  }
+
   synchronized (clients) {
 clients.remove(this);
   }
-  super.close();
-  if (handle != null) {
-if (!handle.getState().isFinal()) {
-  LOG.log(Level.WARNING, "Lost connection to spark application.");
-  handle.setState(SparkAppHandle.State.LOST);
-}
-handle.disconnect();
+
+  synchronized (this) {
+super.close();
+notifyAll();
--- End diff --

See L239.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

2018-01-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20297#discussion_r162693890
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/LauncherConnection.java ---
@@ -95,15 +95,15 @@ protected synchronized void send(Message msg) throws 
IOException {
   }
 
   @Override
-  public void close() throws IOException {
+  public synchronized void close() throws IOException {
--- End diff --

We never *needed* to change it, but the extra code wasn't doing anything 
useful, so I chose the simpler version.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

2018-01-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20297#discussion_r162693731
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
@@ -48,14 +48,16 @@ public synchronized void disconnect() {
 
   @Override
   public synchronized void kill() {
-disconnect();
-if (childProc != null) {
-  if (childProc.isAlive()) {
-childProc.destroyForcibly();
+if (!isDisposed()) {
+  setState(State.KILLED);
--- End diff --

None of the calls below should raise exceptions. Even the socket close is 
wrapped in a try..catch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19340
  
**[Test build #4065 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4065/testReport)**
 for PR 19340 at commit 
[`fda93ae`](https://github.com/apache/spark/commit/fda93aeadd782d520f32eb34475e3a7fa349c425).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20330
  
**[Test build #4063 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4063/testReport)**
 for PR 20330 at commit 
[`6525ef4`](https://github.com/apache/spark/commit/6525ef4eda0bf65bbbcb842495341afc8c5971ad).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20324
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86390/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20324
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20324
  
**[Test build #86390 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86390/testReport)**
 for PR 20324 at commit 
[`673c520`](https://github.com/apache/spark/commit/673c52042a70b5dfc061dd053ae2e6553a4a2612).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20277
  
**[Test build #86394 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86394/testReport)**
 for PR 20277 at commit 
[`3972093`](https://github.com/apache/spark/commit/397209342646a253a56650df8a00dfb6d66c834e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20277
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20277
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/44/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20277
  
retest this please, since the `ColumnarBatch` PR has been merged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20326: [SPARK-23155][DEPLOY] log.server.url links in SHS

2018-01-19 Thread gerashegalov
Github user gerashegalov commented on the issue:

https://github.com/apache/spark/pull/20326
  
@vanzin do you mind considering this issue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...

2018-01-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20330#discussion_r162687023
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
@@ -65,10 +68,13 @@ private[ui] class AllJobsPage(parent: JobsTab, store: 
AppStatusStore) extends We
 }.map { job =>
   val jobId = job.jobId
   val status = job.status
-  val jobDescription = 
store.lastStageAttempt(job.stageIds.max).description
-  val displayJobDescription = jobDescription
-.map(UIUtils.makeDescription(_, "", plainText = true).text)
-.getOrElse("")
+  val (_, lastStageDescription) = lastStageNameAndDescription(store, 
job)
+  val displayJobDescription =
+if (lastStageDescription.isEmpty) {
--- End diff --

nit: I generally prefer the opposite check.

```
if (data is good) 
  do something with data 
else 
  fallback to something else
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...

2018-01-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20330#discussion_r162687347
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
@@ -18,7 +18,7 @@
 package org.apache.spark.ui.jobs
 
 import java.net.URLEncoder
-import java.util.Date
+import java.util.{Collections, Date}
--- End diff --

New import is unused?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...

2018-01-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20330#discussion_r162687444
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
@@ -31,6 +31,7 @@ import org.apache.spark.SparkConf
 import org.apache.spark.internal.config._
 import org.apache.spark.scheduler.TaskLocality
 import org.apache.spark.status._
+import org.apache.spark.status.api.v1
--- End diff --

Just use `JobData` since it's already imported?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...

2018-01-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20330#discussion_r162687232
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala ---
@@ -336,8 +336,14 @@ private[ui] class JobPage(parent: JobsTab, store: 
AppStatusStore) extends WebUIP
 content ++= makeTimeline(activeStages ++ completedStages ++ 
failedStages,
   store.executorList(false), appStartTime)
 
-content ++= UIUtils.showDagVizForJob(
-  jobId, store.operationGraphForJob(jobId))
+val operationGraphContent = 
store.asOption(store.operationGraphForJob(jobId)) match {
+  case Some(operationGraph) => UIUtils.showDagVizForJob(jobId, 
operationGraph)
+  case None =>
+  
--- End diff --

Indentation is off.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...

2018-01-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20330#discussion_r162687160
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
@@ -427,23 +435,21 @@ private[ui] class JobDataSource(
 val formattedDuration = duration.map(d => 
UIUtils.formatDuration(d)).getOrElse("Unknown")
 val submissionTime = jobData.submissionTime
 val formattedSubmissionTime = 
submissionTime.map(UIUtils.formatDate).getOrElse("Unknown")
-val lastStageAttempt = store.lastStageAttempt(jobData.stageIds.max)
-val lastStageDescription = lastStageAttempt.description.getOrElse("")
+val (lastStageName, lastStageDescription) = 
lastStageNameAndDescription(store, jobData)
 
-val formattedJobDescription =
-  UIUtils.makeDescription(lastStageDescription, basePath, plainText = 
false)
+val jobDescription = UIUtils.makeDescription(lastStageDescription, 
basePath, plainText = false)
--- End diff --

No need to check for empty description here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20330
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20330
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86389/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20330
  
**[Test build #86389 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86389/testReport)**
 for PR 20330 at commit 
[`6525ef4`](https://github.com/apache/spark/commit/6525ef4eda0bf65bbbcb842495341afc8c5971ad).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20331
  
**[Test build #86393 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86393/testReport)**
 for PR 20331 at commit 
[`f7693f0`](https://github.com/apache/spark/commit/f7693f0abfe0923868c1918ddcaeaece2c107c5d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20331
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/43/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20331
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20331
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...

2018-01-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20331#discussion_r162683746
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcHadoopFsRelationSuite.scala
 ---
@@ -82,44 +80,4 @@ class OrcHadoopFsRelationSuite extends 
HadoopFsRelationTest {
   }
 }
   }
-
-  test("SPARK-13543: Support for specifying compression codec for ORC via 
option()") {
-withTempPath { dir =>
-  val path = s"${dir.getCanonicalPath}/table1"
-  val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b")
-  df.write
-.option("compression", "ZlIb")
-.orc(path)
-
-  // Check if this is compressed as ZLIB.
-  val maybeOrcFile = new File(path).listFiles().find { f =>
-!f.getName.startsWith("_") && f.getName.endsWith(".zlib.orc")
-  }
-  assert(maybeOrcFile.isDefined)
-  val orcFilePath = maybeOrcFile.get.toPath.toString
-  val expectedCompressionKind =
-OrcFileOperator.getFileReader(orcFilePath).get.getCompression
--- End diff --

The same here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...

2018-01-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20331#discussion_r162683705
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcHadoopFsRelationSuite.scala
 ---
@@ -82,44 +80,4 @@ class OrcHadoopFsRelationSuite extends 
HadoopFsRelationTest {
   }
 }
   }
-
-  test("SPARK-13543: Support for specifying compression codec for ORC via 
option()") {
-withTempPath { dir =>
-  val path = s"${dir.getCanonicalPath}/table1"
-  val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b")
-  df.write
-.option("compression", "ZlIb")
-.orc(path)
-
-  // Check if this is compressed as ZLIB.
-  val maybeOrcFile = new File(path).listFiles().find { f =>
-!f.getName.startsWith("_") && f.getName.endsWith(".zlib.orc")
-  }
-  assert(maybeOrcFile.isDefined)
-  val orcFilePath = maybeOrcFile.get.toPath.toString
-  val expectedCompressionKind =
-OrcFileOperator.getFileReader(orcFilePath).get.getCompression
-  assert("ZLIB" === expectedCompressionKind.name())
-
-  val copyDf = spark
-.read
-.orc(path)
-  checkAnswer(df, copyDf)
-}
-  }
-
-  test("Default compression codec is snappy for ORC compression") {
-withTempPath { file =>
-  spark.range(0, 10).write
-.orc(file.getCanonicalPath)
-  val expectedCompressionKind =
-
OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression
--- End diff --

`OrcFileOperator` is defined in `sql\hive`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20331
  
**[Test build #86392 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86392/testReport)**
 for PR 20331 at commit 
[`f7693f0`](https://github.com/apache/spark/commit/f7693f0abfe0923868c1918ddcaeaece2c107c5d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...

2018-01-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20331#discussion_r162683627
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcHadoopFsRelationSuite.scala
 ---
@@ -82,44 +80,4 @@ class OrcHadoopFsRelationSuite extends 
HadoopFsRelationTest {
   }
 }
   }
-
-  test("SPARK-13543: Support for specifying compression codec for ORC via 
option()") {
-withTempPath { dir =>
-  val path = s"${dir.getCanonicalPath}/table1"
-  val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b")
-  df.write
-.option("compression", "ZlIb")
-.orc(path)
-
-  // Check if this is compressed as ZLIB.
-  val maybeOrcFile = new File(path).listFiles().find { f =>
-!f.getName.startsWith("_") && f.getName.endsWith(".zlib.orc")
-  }
-  assert(maybeOrcFile.isDefined)
-  val orcFilePath = maybeOrcFile.get.toPath.toString
-  val expectedCompressionKind =
-OrcFileOperator.getFileReader(orcFilePath).get.getCompression
-  assert("ZLIB" === expectedCompressionKind.name())
-
-  val copyDf = spark
-.read
-.orc(path)
-  checkAnswer(df, copyDf)
-}
-  }
-
-  test("Default compression codec is snappy for ORC compression") {
-withTempPath { file =>
-  spark.range(0, 10).write
-.orc(file.getCanonicalPath)
-  val expectedCompressionKind =
-
OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression
-  assert("SNAPPY" === expectedCompressionKind.name())
-}
-  }
-}
-
-class HiveOrcHadoopFsRelationSuite extends OrcHadoopFsRelationSuite {
--- End diff --

This is Hive only.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20331
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/42/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20331
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   >