date:20180521

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21165
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90891/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21165
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21165
  
**[Test build #90891 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90891/testReport)**
 for PR 21165 at commit 
[`74911b7`](https://github.com/apache/spark/commit/74911b7a8d7714618ab060b3227e33505b0c5d05).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21067: [SPARK-23980][K8S] Resilient Spark driver on Kubernetes

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21067
  
**[Test build #90895 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90895/testReport)**
 for PR 21067 at commit 
[`95f6886`](https://github.com/apache/spark/commit/95f6886a29ed09eeeb0254c5289fb832328f1581).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-21 Thread xuanyuanking

Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/21370
  
Thanks all reviewer's comments, I address all comments in this commit. 
Please have a look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21381: refactor ExecuteWriteTask

2018-05-21 Thread gengliangwang

GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/21381

refactor ExecuteWriteTask

## What changes were proposed in this pull request?
As I am working on File data source V2 write path [in my repo 
](https://github.com/gengliangwang/spark/blob/47f39e1f54bc748e116ae9580413fae317898327/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileSourceWriter.scala#L78),
 I find it essential to refactor ExecuteWriteTask in FileFormatWriter with 
DataWriter of Data source V2:
1. Reuse the code in both `FileFormat` and Data Source V2
2. Better abstraction, callers only need to call `commit()` or `abort` at 
the end of task. Also there is less code in `SingleDirectoryWriteTask` and 
`DynamicPartitionWriteTask`.

This PR is part of data source V2 migration. Definitions of related classes 
is moved to a new file, and `ExecuteWriteTask` is rename to 
`FileFormatDataWriter`


## How was this patch tested?
Existing unit test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark refactorExecuteWriteTask

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21381.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21381


commit cbd4ce2959bdfe63dff32d0c36b2982fcde22aac
Author: Gengliang Wang 
Date:   2018-05-21T12:16:14Z

refactor ExecuteWriteTask




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

2018-05-21 Thread sergey-rubtsov

Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/21363#discussion_r189597989
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -140,14 +141,23 @@ private[csv] object CSVInferSchema {
   private def tryParseDouble(field: String, options: CSVOptions): DataType 
= {
 if ((allCatch opt field.toDouble).isDefined || isInfOrNan(field, 
options)) {
   DoubleType
+} else {
+  tryParseDate(field, options)
--- End diff --

At the moment, DateType here is ignored at all, I'm not sure that it was 
conceived when the type was created


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...

2018-05-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21342#discussion_r189603209
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
 ---
@@ -111,12 +112,18 @@ case class BroadcastExchangeExec(
   SQLMetrics.postDriverMetricUpdates(sparkContext, executionId, 
metrics.values.toSeq)
   broadcasted
 } catch {
+  // SPARK-24294: To bypass scala bug: 
https://github.com/scala/bug/issues/9554, we throw
+  // SparkFatalException, which is a subclass of Exception. 
ThreadUtils.awaitResult
+  // will catch this exception and re-throw the wrapped fatal 
throwable.
   case oe: OutOfMemoryError =>
--- End diff --

not related to this PR, but I'm a little worried about catching OOM here. 
Spark has `SparkOutOfMemoryError`, and it seems more reasonable to catch 
`SparkOutOfMemoryError`. This can be fixed in another PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r189606444
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -292,31 +297,25 @@ class Dataset[T] private[sql](
   }
 
   // Create SeparateLine
-  val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
+  val sep: String = if (html) {
+// Initial append table label
+sb.append("\n")
+"\n"
+  } else {
+colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString()
+  }
 
   // column names
-  rows.head.zipWithIndex.map { case (cell, i) =>
-if (truncate > 0) {
-  StringUtils.leftPad(cell, colWidths(i))
-} else {
-  StringUtils.rightPad(cell, colWidths(i))
-}
-  }.addString(sb, "|", "|", "|\n")
-
+  appendRowString(rows.head, truncate, colWidths, html, true, sb)
   sb.append(sep)
 
   // data
-  rows.tail.foreach {
-_.zipWithIndex.map { case (cell, i) =>
-  if (truncate > 0) {
-StringUtils.leftPad(cell.toString, colWidths(i))
-  } else {
-StringUtils.rightPad(cell.toString, colWidths(i))
-  }
-}.addString(sb, "|", "|", "|\n")
+  rows.tail.foreach { row =>
+appendRowString(row.map(_.toString), truncate, colWidths, html, 
false, sb)
--- End diff --

I know this is not your change, but the `rows` is already 
`Seq[Seq[String]]` and the `row` is `Seq[String]`, so I think we can remove it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r189606746
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.jupyter.eagerEval.enabled
--- End diff --

Oh, yes, I'd prefer to add `sql`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r189606695
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -358,6 +357,43 @@ class Dataset[T] private[sql](
 sb.toString()
   }
 
+  /**
+   * Transform current row string and append to builder
+   *
+   * @param row   Current row of string
+   * @param truncate  If set to more than 0, truncates strings to 
`truncate` characters and
+   *all cells will be aligned right.
+   * @param colWidths The width of each column
+   * @param html  If set to true, return output as html table.
+   * @param head  Set to true while current row is table head.
+   * @param sbStringBuilder for current row.
+   */
+  private[sql] def appendRowString(
+  row: Seq[String],
+  truncate: Int,
+  colWidths: Array[Int],
+  html: Boolean,
+  head: Boolean,
+  sb: StringBuilder): Unit = {
+val data = row.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell, colWidths(i))
+  } else {
+StringUtils.rightPad(cell, colWidths(i))
+  }
+}
+(html, head) match {
+  case (true, true) =>
+data.map(StringEscapeUtils.escapeHtml).addString(
+  sb, "", "", "")
--- End diff --

Hmm, the header looks okay, but the data section will be a long line 
without `\n`? How about adding `\n` here and the data section, and just using 
`""` for the seperatedLine?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21381
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21381
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90897/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

2018-05-21 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21363
  
@sergey-rubtsov we have to keep backward compatibility. If a user upgrades, 
with your change a running application may break because of data not being 
anymore timestamp but date. We can add a new entry in `SQLConf` for that.

Moreover, we have to mention in the migration guide all the behavioral 
changes we introduce.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-05-21 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/21068
  
ping @tgravescs .  honestly I still don't love the blacklist limit, 
especially since it makes reporting back to the driver pretty confusing, and I 
don't think it buys us much.  But I can live with it.  and otherwise I think 
this is ready.

I've also looked at Attila's tests on a real cluster


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-05-21 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20636
  
ping @hvanhovell


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21370
  
**[Test build #90896 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90896/testReport)**
 for PR 21370 at commit 
[`a798cf2`](https://github.com/apache/spark/commit/a798cf20d20b9b0f899ce477a56502549672482a).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r189603851
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,26 @@ def show(self, n=20, truncate=True, vertical=False):
  name | Bob
 """
 if isinstance(truncate, bool) and truncate:
-print(self._jdf.showString(n, 20, vertical))
+print(self._jdf.showString(n, 20, vertical, False))
 else:
-print(self._jdf.showString(n, int(truncate), vertical))
+print(self._jdf.showString(n, int(truncate), vertical, False))
 
 def __repr__(self):
 return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
 
+def _repr_html_(self):
--- End diff --

No problem, is the SQLTests in pyspark/sql/tests.py the right place?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r189614067
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,26 @@ def show(self, n=20, truncate=True, vertical=False):
  name | Bob
 """
 if isinstance(truncate, bool) and truncate:
-print(self._jdf.showString(n, 20, vertical))
+print(self._jdf.showString(n, 20, vertical, False))
 else:
-print(self._jdf.showString(n, int(truncate), vertical))
+print(self._jdf.showString(n, int(truncate), vertical, False))
 
 def __repr__(self):
 return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
 
+def _repr_html_(self):
--- End diff --

No problem, I'll added in `SQLTests` in next commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r189614136
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.jupyter.eagerEval.enabled
--- End diff --

Got it, fix it in next commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3419/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...

2018-05-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21342#discussion_r189632062
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/SparkFatalException.scala ---
@@ -0,0 +1,24 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+/**
+ * SPARK-24294: To bypass scala bug: 
https://github.com/scala/bug/issues/9554, we catch
+ * fatal throwable in {@link scala.concurrent.Future}'s body, and re-throw
+ * SparkFatalException, which wraps the fatal throwable inside.
+ */
+private[spark] final class SparkFatalException(val throwable: Throwable) 
extends Exception
--- End diff --

I believe it will not. The `SparkFatalException` has a short life cycle: it 
is created inside `Future {}` and then caught and stripped by 
`ThreadUtils.awaitResult`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

2018-05-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21288#discussion_r189635143
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
 ---
@@ -0,0 +1,437 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.io.File
+
+import scala.util.{Random, Try}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.{DataFrame, SparkSession}
+import org.apache.spark.sql.functions.monotonically_increasing_id
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.{Benchmark, Utils}
+
+
+/**
+ * Benchmark to measure read performance with Filter pushdown.
+ * To run this:
+ *  spark-submit --class  
+ */
+object FilterPushdownBenchmark {
+  val conf = new SparkConf()
+.setAppName("FilterPushdownBenchmark")
+.setIfMissing("spark.master", "local[1]")
+.setIfMissing("spark.driver.memory", "3g")
+.setIfMissing("spark.executor.memory", "3g")
+.setIfMissing("orc.compression", "snappy")
+.setIfMissing("spark.sql.parquet.compression.codec", "snappy")
+
+  private val spark = SparkSession.builder().config(conf).getOrCreate()
+
+  def withTempPath(f: File => Unit): Unit = {
+val path = Utils.createTempDir()
+path.delete()
+try f(path) finally Utils.deleteRecursively(path)
+  }
+
+  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
+try f finally tableNames.foreach(spark.catalog.dropTempView)
+  }
+
+  def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
+val (keys, values) = pairs.unzip
+val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption)
+(keys, values).zipped.foreach(spark.conf.set)
+try f finally {
+  keys.zip(currentValues).foreach {
+case (key, Some(value)) => spark.conf.set(key, value)
+case (key, None) => spark.conf.unset(key)
+  }
+}
+  }
+
+  private def prepareTable(
+  dir: File, numRows: Int, width: Int, useStringForValue: Boolean): 
Unit = {
+import spark.implicits._
+val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i")
+val valueCol = if (useStringForValue) {
+  monotonically_increasing_id().cast("string")
+} else {
+  monotonically_increasing_id()
+}
+val df = spark.range(numRows).map(_ => 
Random.nextLong).selectExpr(selectExpr: _*)
+  .withColumn("value", valueCol)
+  .sort("value")
+
+saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
+saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
+  }
+
+  private def prepareStringDictTable(
+  dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = 
{
+val selectExpr = (0 to width).map {
+  case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value"
+  case i => s"CAST(rand() AS STRING) c$i"
+}
+val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value")
+
+saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
+saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
+  }
+
+  private def saveAsOrcTable(df: DataFrame, dir: String): Unit = {
+df.write.mode("overwrite").orc(dir)
+spark.read.orc(dir).createOrReplaceTempView("orcTable")
+  }
+
+  private def saveAsParquetTable(df: DataFrame, dir: String): Unit = {
+df.write.mode("overwrite").parquet(dir)
+spark.read.parquet(dir).createOrReplaceTempView("parquetTable")
+  }
+
+  def filterPushDownBenchmark(
+  values: Int,
+  title: String,
+  whereExpr: String,
+  selectExpr: String = "*"): Unit = {
+val benchmark = new Benchmark(title, values, minNumIters = 5)
+
+Seq(false, true).foreach { pushDownEnabled =>
+  val

[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

2018-05-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21236


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21361: [SPARK-24313][SQL] Fix collection operations' interprete...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21361
  
**[Test build #90903 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90903/testReport)**
 for PR 21361 at commit 
[`6315775`](https://github.com/apache/spark/commit/63157756fde2b4e9dc80fce0296da68067c57422).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21356: [SPARK-24309][CORE] AsyncEventQueue should stop o...

2018-05-21 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21356#discussion_r189604443
  
--- Diff: core/src/main/scala/org/apache/spark/util/ListenerBus.scala ---
@@ -80,7 +89,16 @@ private[spark] trait ListenerBus[L <: AnyRef, E] extends 
Logging {
   }
   try {
 doPostEvent(listener, event)
+if (Thread.interrupted()) {
+  logError(s"Interrupted while posting to 
${Utils.getFormattedClassName(listener)}.  " +
+s"Removing that listener.")
+  removeListenerOnError(listener)
--- End diff --

`Thread.interrupted()` also clears the interrupted state.  So that alone 
isn't a problem -- we're basically declaring that we've handled the interrupt 
and nobody else gets to know about it anymore.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r189613358
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -292,31 +297,25 @@ class Dataset[T] private[sql](
   }
 
   // Create SeparateLine
-  val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
+  val sep: String = if (html) {
+// Initial append table label
+sb.append("\n")
+"\n"
+  } else {
+colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString()
+  }
 
   // column names
-  rows.head.zipWithIndex.map { case (cell, i) =>
-if (truncate > 0) {
-  StringUtils.leftPad(cell, colWidths(i))
-} else {
-  StringUtils.rightPad(cell, colWidths(i))
-}
-  }.addString(sb, "|", "|", "|\n")
-
+  appendRowString(rows.head, truncate, colWidths, html, true, sb)
   sb.append(sep)
 
   // data
-  rows.tail.foreach {
-_.zipWithIndex.map { case (cell, i) =>
-  if (truncate > 0) {
-StringUtils.leftPad(cell.toString, colWidths(i))
-  } else {
-StringUtils.rightPad(cell.toString, colWidths(i))
-  }
-}.addString(sb, "|", "|", "|\n")
+  rows.tail.foreach { row =>
+appendRowString(row.map(_.toString), truncate, colWidths, html, 
false, sb)
--- End diff --

I see, the `cell.toString` has been called here. 
https://github.com/apache/spark/pull/21370/files/f2bb8f334631734869ddf5d8ef1eca1fa29d334a#diff-7a46f10c3cedbf013cf255564d9483cdR271
Got it, I'll fix this in next commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21193: [SPARK-24121][SQL] Add API for handling expressio...

2018-05-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21193#discussion_r189617272
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
 ---
@@ -167,9 +170,40 @@ object Block {
   case other => throw new IllegalArgumentException(
 s"Can not interpolate ${other.getClass.getName} into code 
block.")
 }
-CodeBlock(sc.parts, args)
+
+val (codeParts, blockInputs) = foldLiteralArgs(sc.parts, args)
+CodeBlock(codeParts, blockInputs)
+  }
+}
+  }
+
+  // Folds eagerly the literal args into the code parts.
+  private def foldLiteralArgs(parts: Seq[String], args: Seq[Any]): 
(Seq[String], Seq[Any]) = {
+val codeParts = ArrayBuffer.empty[String]
+val blockInputs = ArrayBuffer.empty[Any]
+
+val strings = parts.iterator
+val inputs = args.iterator
+val buf = new StringBuilder(Block.CODE_BLOCK_BUFFER_LENGTH)
+
+buf append strings.next
--- End diff --

can we use java style here? `buf.append(strings.next)`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...

2018-05-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21342#discussion_r189627101
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/SparkFatalException.scala ---
@@ -0,0 +1,24 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+/**
+ * SPARK-24294: To bypass scala bug: 
https://github.com/scala/bug/issues/9554, we catch
+ * fatal throwable in {@link scala.concurrent.Future}'s body, and re-throw
+ * SparkFatalException, which wraps the fatal throwable inside.
+ */
+private[spark] final class SparkFatalException(val throwable: Throwable) 
extends Exception
--- End diff --

Will this change impact 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala#L42
 ? cc @zsxwing @JoshRosen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...

2018-05-21 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21342#discussion_r189634704
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
 ---
@@ -111,12 +112,18 @@ case class BroadcastExchangeExec(
   SQLMetrics.postDriverMetricUpdates(sparkContext, executionId, 
metrics.values.toSeq)
   broadcasted
 } catch {
+  // SPARK-24294: To bypass scala bug: 
https://github.com/scala/bug/issues/9554, we throw
+  // SparkFatalException, which is a subclass of Exception. 
ThreadUtils.awaitResult
+  // will catch this exception and re-throw the wrapped fatal 
throwable.
   case oe: OutOfMemoryError =>
-throw new OutOfMemoryError(s"Not enough memory to build and 
broadcast the table to " +
+throw new SparkFatalException(
+  new OutOfMemoryError(s"Not enough memory to build and 
broadcast the table to " +
--- End diff --

Just curious: Can we perform object operations (allocate 
`OutOfMemoryError`, allocate and concatenate `String`s) when we caught ` 
OutOfMemoryError`?
I think that we have space since we failed to allocate a large object.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to...

2018-05-21 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21366#discussion_r189642632
  
--- Diff: pom.xml ---
@@ -150,6 +150,7 @@
 
 4.5.4
 4.4.8
+3.0.1
--- End diff --

These are data structures optimized for storing primitives. We could use 
standard Scala here functionally speaking.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21382: [SPARK-24332][SS][MESOS]Fix places reading 'spark...

2018-05-21 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/21382

[SPARK-24332][SS][MESOS]Fix places reading 'spark.network.timeout' as 
milliseconds

## What changes were proposed in this pull request?

This PR replaces `getTimeAsMs` with `getTimeAsSeconds` to fix the issue 
that reading "spark.network.timeout" using a wrong time unit when the user 
doesn't specify a time out.

## How was this patch tested?

Jenkins


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark fix-network-timeout-conf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21382.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21382


commit 145dd3b2a7a696d13c0d6286029f0ae47ba96d3b
Author: Shixiong Zhu 
Date:   2018-05-21T16:34:48Z

Fix places reading 'spark.network.timeout' as milliseconds




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3311/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

2018-05-21 Thread ah-

Github user ah- commented on the issue:

https://github.com/apache/spark/pull/20272
  
Is anyone still looking at this? It seems like it should just work?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...

2018-05-21 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19691
  
the failed UTs are valid failures


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

2018-05-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21288#discussion_r189639582
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
 ---
@@ -0,0 +1,437 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.io.File
+
+import scala.util.{Random, Try}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.{DataFrame, SparkSession}
+import org.apache.spark.sql.functions.monotonically_increasing_id
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.{Benchmark, Utils}
+
+
+/**
+ * Benchmark to measure read performance with Filter pushdown.
+ * To run this:
+ *  spark-submit --class  
+ */
+object FilterPushdownBenchmark {
+  val conf = new SparkConf()
+.setAppName("FilterPushdownBenchmark")
+.setIfMissing("spark.master", "local[1]")
+.setIfMissing("spark.driver.memory", "3g")
+.setIfMissing("spark.executor.memory", "3g")
+.setIfMissing("orc.compression", "snappy")
+.setIfMissing("spark.sql.parquet.compression.codec", "snappy")
+
+  private val spark = SparkSession.builder().config(conf).getOrCreate()
+
+  def withTempPath(f: File => Unit): Unit = {
+val path = Utils.createTempDir()
+path.delete()
+try f(path) finally Utils.deleteRecursively(path)
+  }
+
+  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
+try f finally tableNames.foreach(spark.catalog.dropTempView)
+  }
+
+  def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
+val (keys, values) = pairs.unzip
+val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption)
+(keys, values).zipped.foreach(spark.conf.set)
+try f finally {
+  keys.zip(currentValues).foreach {
+case (key, Some(value)) => spark.conf.set(key, value)
+case (key, None) => spark.conf.unset(key)
+  }
+}
+  }
+
+  private def prepareTable(
+  dir: File, numRows: Int, width: Int, useStringForValue: Boolean): 
Unit = {
+import spark.implicits._
+val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i")
+val valueCol = if (useStringForValue) {
+  monotonically_increasing_id().cast("string")
+} else {
+  monotonically_increasing_id()
+}
+val df = spark.range(numRows).map(_ => 
Random.nextLong).selectExpr(selectExpr: _*)
+  .withColumn("value", valueCol)
+  .sort("value")
+
+saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
+saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
+  }
+
+  private def prepareStringDictTable(
+  dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = 
{
+val selectExpr = (0 to width).map {
+  case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value"
+  case i => s"CAST(rand() AS STRING) c$i"
+}
+val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value")
+
+saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
+saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
+  }
+
+  private def saveAsOrcTable(df: DataFrame, dir: String): Unit = {
+df.write.mode("overwrite").orc(dir)
+spark.read.orc(dir).createOrReplaceTempView("orcTable")
+  }
+
+  private def saveAsParquetTable(df: DataFrame, dir: String): Unit = {
+df.write.mode("overwrite").parquet(dir)
+spark.read.parquet(dir).createOrReplaceTempView("parquetTable")
+  }
+
+  def filterPushDownBenchmark(
+  values: Int,
+  title: String,
+  whereExpr: String,
+  selectExpr: String = "*"): Unit = {
+val benchmark = new Benchmark(title, values, minNumIters = 5)
+
+Seq(false, true).foreach { pushDownEnabled =>
+  val

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21381
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3420/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21381
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90902/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21317: [SPARK-24232][k8s] Add support for secret env var...

2018-05-21 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21317#discussion_r189643350
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/EnvSecretsFeatureStepSuite.scala
 ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import io.fabric8.kubernetes.api.model.PodBuilder
+
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.deploy.k8s._
+
+class EnvSecretsFeatureStepSuite extends SparkFunSuite{
+  private val KEY_REF_NAME_FOO = "foo"
+  private val KEY_REF_NAME_BAR = "bar"
+  private val KEY_REF_KEY_FOO = "key_foo"
+  private val KEY_REF_KEY_BAR = "key_bar"
+  private val ENV_NAME_FOO = "MY_FOO"
+  private val ENV_NAME_BAR = "MY_bar"
+
+  test("sets up all keyRefs") {
+val baseDriverPod = SparkPod.initialPod()
+val envVarsToKeys = Map(
+  ENV_NAME_BAR -> s"${KEY_REF_NAME_BAR}:${KEY_REF_KEY_BAR}",
+  ENV_NAME_FOO -> s"${KEY_REF_NAME_FOO}:${KEY_REF_KEY_FOO}")
+val sparkConf = new SparkConf(false)
+val kubernetesConf = KubernetesConf(
+  sparkConf,
+  KubernetesExecutorSpecificConf("1", new PodBuilder().build()),
+  "resource-name-prefix",
+  "app-id",
+  Map.empty,
+  Map.empty,
+  Map.empty,
+  envVarsToKeys,
+  Map.empty)
+
+val step = new EnvSecretsFeatureStep(kubernetesConf)
+val driverContainerWithEnvSecrets = 
step.configurePod(baseDriverPod).container
+
+val expectedVars =
+  Seq(s"${ENV_NAME_BAR}", s"${ENV_NAME_FOO}")
+
+expectedVars.foreach { envName =>
+  
assert(SecretEnvUtils.containerHasEnvVar(driverContainerWithEnvSecrets, 
envName))
--- End diff --

It's better if the bests also verify the values of the env vars.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21317: [SPARK-24232][k8s] Add support for secret env var...

2018-05-21 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21317#discussion_r189639396
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -140,6 +140,13 @@ namespace as that of the driver and executor pods. For 
example, to mount a secre
 --conf spark.kubernetes.executor.secrets.spark-secret=/etc/secrets
 ```
 
+To add a secret as an env variable to either the driver or the executor 
container use the following options to the
--- End diff --

s/`To add a secret as an ...` /`To use a secret through an environment 
variable`/.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21381
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21381
  
**[Test build #90902 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90902/testReport)**
 for PR 21381 at commit 
[`28e33ff`](https://github.com/apache/spark/commit/28e33fffa4d7fb1a747e1f0d15b46ca08be23dbb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21317: [SPARK-24232][k8s] Add support for secret env var...

2018-05-21 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21317#discussion_r189640364
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/EnvSecretsFeatureStep.scala
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, EnvVarBuilder, 
HasMetadata}
+
+import org.apache.spark.deploy.k8s.{KubernetesConf, 
KubernetesRoleSpecificConf, SparkPod}
+
+private[spark] class EnvSecretsFeatureStep(
+kubernetesConf: KubernetesConf[_ <: KubernetesRoleSpecificConf])
+  extends KubernetesFeatureConfigStep {
+  override def configurePod(pod: SparkPod): SparkPod = {
+val addedEnvSecrets = kubernetesConf
+  .roleSecretEnvNamesToKeyRefs
+  .map{ case (envName, keyRef) =>
+// Keyref parts
+val keyRefParts = keyRef.split(":")
+require(keyRefParts.size == 2, "KeyRef must be in the form 
name:key.")
+val name = keyRefParts(0)
+val key = keyRefParts(1)
+new EnvVarBuilder()
+  .withName(envName)
+  .withNewValueFrom()
+  .withNewSecretKeyRef()
+  .withKey(key)
--- End diff --

The indention seems incorrect.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21317: [SPARK-24232][k8s] Add support for secret env var...

2018-05-21 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21317#discussion_r189642295
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/EnvSecretsFeatureStep.scala
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, EnvVarBuilder, 
HasMetadata}
+
+import org.apache.spark.deploy.k8s.{KubernetesConf, 
KubernetesRoleSpecificConf, SparkPod}
+
+private[spark] class EnvSecretsFeatureStep(
+kubernetesConf: KubernetesConf[_ <: KubernetesRoleSpecificConf])
+  extends KubernetesFeatureConfigStep {
+  override def configurePod(pod: SparkPod): SparkPod = {
+val addedEnvSecrets = kubernetesConf
+  .roleSecretEnvNamesToKeyRefs
+  .map{ case (envName, keyRef) =>
+// Keyref parts
+val keyRefParts = keyRef.split(":")
+require(keyRefParts.size == 2, "KeyRef must be in the form 
name:key.")
--- End diff --

s/`KeyRef`/`SecretKeyRef`/.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21381
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3421/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21356: [SPARK-24309][CORE] AsyncEventQueue should stop o...

2018-05-21 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21356#discussion_r189647689
  
--- Diff: core/src/main/scala/org/apache/spark/util/ListenerBus.scala ---
@@ -80,7 +89,16 @@ private[spark] trait ListenerBus[L <: AnyRef, E] extends 
Logging {
   }
   try {
 doPostEvent(listener, event)
+if (Thread.interrupted()) {
+  logError(s"Interrupted while posting to 
${Utils.getFormattedClassName(listener)}.  " +
+s"Removing that listener.")
+  removeListenerOnError(listener)
--- End diff --

You could just throw `InterruptedException` here to avoid duplicating the 
error handling code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3311/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21381
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3416/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21381
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21361: [SPARK-24313][SQL] Fix collection operations' interprete...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21361
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3417/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21361: [SPARK-24313][SQL] Fix collection operations' interprete...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21361
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-21 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21288
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21372
  
Sure, @maropu . In addition, I reviewed the nine patches, almost trivial 
ones. I'll update the PR description more.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-21 Thread mccheah

Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/21366
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21372#discussion_r189644114
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
 ---
@@ -136,7 +136,7 @@ public int getInt(int rowId) {
   public long getLong(int rowId) {
 int index = getRowIndex(rowId);
 if (isTimestamp) {
-  return timestampData.time[index] * 1000 + timestampData.nanos[index] 
/ 1000;
+  return timestampData.time[index] * 1000 + timestampData.nanos[index] 
/ 1000 % 1000;
--- End diff --

Add a test case? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21370
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90896/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21370
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21342: [SPARK-24294] Throw SparkException when OOM in Broadcast...

2018-05-21 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21342
  
LGTM, which scala version has fixed this bug?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19691
  
**[Test build #90894 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90894/testReport)**
 for PR 19691 at commit 
[`dd5d482`](https://github.com/apache/spark/commit/dd5d4825d247d9949eb2bf10c05ee7661c389709).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19691
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90894/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19691
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

2018-05-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21288#discussion_r189638131
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
 ---
@@ -0,0 +1,437 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.io.File
+
+import scala.util.{Random, Try}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.{DataFrame, SparkSession}
+import org.apache.spark.sql.functions.monotonically_increasing_id
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.{Benchmark, Utils}
+
+
+/**
+ * Benchmark to measure read performance with Filter pushdown.
+ * To run this:
+ *  spark-submit --class  
+ */
+object FilterPushdownBenchmark {
+  val conf = new SparkConf()
+.setAppName("FilterPushdownBenchmark")
+.setIfMissing("spark.master", "local[1]")
+.setIfMissing("spark.driver.memory", "3g")
+.setIfMissing("spark.executor.memory", "3g")
+.setIfMissing("orc.compression", "snappy")
+.setIfMissing("spark.sql.parquet.compression.codec", "snappy")
+
+  private val spark = SparkSession.builder().config(conf).getOrCreate()
+
+  def withTempPath(f: File => Unit): Unit = {
+val path = Utils.createTempDir()
+path.delete()
+try f(path) finally Utils.deleteRecursively(path)
+  }
+
+  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
+try f finally tableNames.foreach(spark.catalog.dropTempView)
+  }
+
+  def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
+val (keys, values) = pairs.unzip
+val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption)
+(keys, values).zipped.foreach(spark.conf.set)
+try f finally {
+  keys.zip(currentValues).foreach {
+case (key, Some(value)) => spark.conf.set(key, value)
+case (key, None) => spark.conf.unset(key)
+  }
+}
+  }
+
+  private def prepareTable(
+  dir: File, numRows: Int, width: Int, useStringForValue: Boolean): 
Unit = {
+import spark.implicits._
+val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i")
+val valueCol = if (useStringForValue) {
+  monotonically_increasing_id().cast("string")
+} else {
+  monotonically_increasing_id()
+}
+val df = spark.range(numRows).map(_ => 
Random.nextLong).selectExpr(selectExpr: _*)
+  .withColumn("value", valueCol)
+  .sort("value")
+
+saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
+saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
+  }
+
+  private def prepareStringDictTable(
+  dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = 
{
+val selectExpr = (0 to width).map {
+  case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value"
+  case i => s"CAST(rand() AS STRING) c$i"
+}
+val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value")
+
+saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
+saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
+  }
+
+  private def saveAsOrcTable(df: DataFrame, dir: String): Unit = {
+df.write.mode("overwrite").orc(dir)
+spark.read.orc(dir).createOrReplaceTempView("orcTable")
+  }
+
+  private def saveAsParquetTable(df: DataFrame, dir: String): Unit = {
+df.write.mode("overwrite").parquet(dir)
+spark.read.parquet(dir).createOrReplaceTempView("parquetTable")
+  }
+
+  def filterPushDownBenchmark(
+  values: Int,
+  title: String,
+  whereExpr: String,
+  selectExpr: String = "*"): Unit = {
+val benchmark = new Benchmark(title, values, minNumIters = 5)
+
+Seq(false, true).foreach { pushDownEnabled =>
+

[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

2018-05-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21376


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21381
  
**[Test build #90906 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90906/testReport)**
 for PR 21381 at commit 
[`3c29628`](https://github.com/apache/spark/commit/3c29628b2f7c9c5a817791866fa450fe4ebafced).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #90907 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90907/testReport)**
 for PR 21366 at commit 
[`931529a`](https://github.com/apache/spark/commit/931529a74c07b3212dcbe34e7329384555777146).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21368: [SPARK-16451][repl] Fail shell if SparkSession fa...

2018-05-21 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21368#discussion_r189649450
  
--- Diff: 
repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala ---
@@ -44,7 +44,14 @@ class SparkILoop(in0: Option[BufferedReader], out: 
JPrintWriter)
 @transient val spark = if (org.apache.spark.repl.Main.sparkSession != 
null) {
 org.apache.spark.repl.Main.sparkSession
   } else {
-org.apache.spark.repl.Main.createSparkSession()
+try {
+  org.apache.spark.repl.Main.createSparkSession()
+} catch {
+  case e: Exception =>
+println("Failed to initialize Spark session:")
+e.printStackTrace()
+sys.exit(1)
--- End diff --

My usual response is "this is not a public class" (it's not in the public 
API docs), but let me see if it's easy to restrict the `sys.exit` to 
spark-shell invocations.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21382: [SPARK-24332][SS][MESOS]Fix places reading 'spark.networ...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21382
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3422/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21382: [SPARK-24332][SS][MESOS]Fix places reading 'spark.networ...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21382
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21356: [SPARK-24309][CORE] AsyncEventQueue should stop o...

2018-05-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21356#discussion_r189607562
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala ---
@@ -111,6 +111,12 @@ private[spark] class LiveListenerBus(conf: SparkConf) {
 }
   }
 
+  private[scheduler] def removeQueue(queue: String): Unit = synchronized {
--- End diff --

can we remove it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r189611792
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -358,6 +357,43 @@ class Dataset[T] private[sql](
 sb.toString()
   }
 
+  /**
+   * Transform current row string and append to builder
+   *
+   * @param row   Current row of string
+   * @param truncate  If set to more than 0, truncates strings to 
`truncate` characters and
+   *all cells will be aligned right.
+   * @param colWidths The width of each column
+   * @param html  If set to true, return output as html table.
+   * @param head  Set to true while current row is table head.
+   * @param sbStringBuilder for current row.
+   */
+  private[sql] def appendRowString(
+  row: Seq[String],
+  truncate: Int,
+  colWidths: Array[Int],
+  html: Boolean,
+  head: Boolean,
+  sb: StringBuilder): Unit = {
+val data = row.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell, colWidths(i))
+  } else {
+StringUtils.rightPad(cell, colWidths(i))
+  }
+}
+(html, head) match {
+  case (true, true) =>
+data.map(StringEscapeUtils.escapeHtml).addString(
+  sb, "", "", "")
--- End diff --

Ah, I understand your consideration. I'll add this in next commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-05-21 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20345
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21381
  
**[Test build #90897 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90897/testReport)**
 for PR 21381 at commit 
[`cbd4ce2`](https://github.com/apache/spark/commit/cbd4ce2959bdfe63dff32d0c36b2982fcde22aac).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class FileFormatDataWriter(`
  * `class EmptyDirectoryDataWriter(`
  * `class SingleDirectoryDataWriter(`
  * `class DynamicPartitionDataWriter(`
  * `class WriteJobDescription(`
  * `case class WriteTaskResult(commitMsg: TaskCommitMessage, summary: 
ExecutedWriteSummary)`
  * `case class ExecutedWriteSummary(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21288
  
**[Test build #90904 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90904/testReport)**
 for PR 21288 at commit 
[`39e5a50`](https://github.com/apache/spark/commit/39e5a507fe22cade6bed0613eefbccab15cf45ff).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20345
  
**[Test build #90905 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90905/testReport)**
 for PR 20345 at commit 
[`94d9171`](https://github.com/apache/spark/commit/94d9171b8ec26c21724dd393cf4fc83ff52623e7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21193: [SPARK-24121][SQL] Add API for handling expressio...

2018-05-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21193#discussion_r189616918
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
 ---
@@ -167,9 +170,40 @@ object Block {
   case other => throw new IllegalArgumentException(
 s"Can not interpolate ${other.getClass.getName} into code 
block.")
 }
-CodeBlock(sc.parts, args)
+
+val (codeParts, blockInputs) = foldLiteralArgs(sc.parts, args)
+CodeBlock(codeParts, blockInputs)
+  }
+}
+  }
+
+  // Folds eagerly the literal args into the code parts.
+  private def foldLiteralArgs(parts: Seq[String], args: Seq[Any]): 
(Seq[String], Seq[Any]) = {
+val codeParts = ArrayBuffer.empty[String]
+val blockInputs = ArrayBuffer.empty[Any]
--- End diff --

shall we make the type `JavaCode` instead of `Any`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3418/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21045: [SPARK-23931][SQL] Adds zip function to sparksql

2018-05-21 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21045
  
cc @ueshin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

2018-05-21 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21376
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21067: [SPARK-23980][K8S] Resilient Spark driver on Kubernetes

2018-05-21 Thread liyinan926

Github user liyinan926 commented on the issue:

https://github.com/apache/spark/pull/21067
  
@foxish on concerns of the lack of exactly-one semantics.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21382: [SPARK-24332][SS][MESOS]Fix places reading 'spark.networ...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21382
  
**[Test build #90908 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90908/testReport)**
 for PR 21382 at commit 
[`145dd3b`](https://github.com/apache/spark/commit/145dd3b2a7a696d13c0d6286029f0ae47ba96d3b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21368: [SPARK-16451][repl] Fail shell if SparkSession fa...

2018-05-21 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21368#discussion_r189649746
  
--- Diff: 
repl/scala-2.12/src/main/scala/org/apache/spark/repl/SparkILoop.scala ---
@@ -37,7 +37,14 @@ class SparkILoop(in0: Option[BufferedReader], out: 
JPrintWriter)
 @transient val spark = if (org.apache.spark.repl.Main.sparkSession != 
null) {
 org.apache.spark.repl.Main.sparkSession
   } else {
-org.apache.spark.repl.Main.createSparkSession()
+try {
+  org.apache.spark.repl.Main.createSparkSession()
+} catch {
+  case e: Exception =>
+println("Failed to initialize Spark session:")
+e.printStackTrace()
+sys.exit(1)
+}
--- End diff --

I'm not that familiar with those two shells but I'll give it a try.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21368: [SPARK-16451][repl] Fail shell if SparkSession fa...

2018-05-21 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21368#discussion_r189649541
  
--- Diff: python/pyspark/shell.py ---
@@ -38,25 +41,29 @@
 SparkContext._ensure_initialized()
 
 try:
-# Try to access HiveConf, it will raise exception if Hive is not added
-conf = SparkConf()
-if conf.get('spark.sql.catalogImplementation', 'hive').lower() == 
'hive':
-SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf()
-spark = SparkSession.builder\
-.enableHiveSupport()\
-.getOrCreate()
-else:
+try:
+# Try to access HiveConf, it will raise exception if Hive is not 
added
+conf = SparkConf()
+if conf.get('spark.sql.catalogImplementation', 'hive').lower() == 
'hive':
+SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf()
+spark = SparkSession.builder\
+.enableHiveSupport()\
+.getOrCreate()
+else:
+spark = SparkSession.builder.getOrCreate()
+except py4j.protocol.Py4JError:
+if conf.get('spark.sql.catalogImplementation', '').lower() == 
'hive':
+warnings.warn("Fall back to non-hive support because failing 
to access HiveConf, "
+  "please make sure you build spark with hive")
+spark = SparkSession.builder.getOrCreate()
+except TypeError:
+if conf.get('spark.sql.catalogImplementation', '').lower() == 
'hive':
+warnings.warn("Fall back to non-hive support because failing 
to access HiveConf, "
+  "please make sure you build spark with hive")
 spark = SparkSession.builder.getOrCreate()
-except py4j.protocol.Py4JError:
-if conf.get('spark.sql.catalogImplementation', '').lower() == 'hive':
-warnings.warn("Fall back to non-hive support because failing to 
access HiveConf, "
-  "please make sure you build spark with hive")
-spark = SparkSession.builder.getOrCreate()
-except TypeError:
-if conf.get('spark.sql.catalogImplementation', '').lower() == 'hive':
-warnings.warn("Fall back to non-hive support because failing to 
access HiveConf, "
-  "please make sure you build spark with hive")
-spark = SparkSession.builder.getOrCreate()
+except Exception as e:
+print("Failed to initialize Spark session:", e, file=sys.stderr)
--- End diff --

Printing the exception shows its traceback.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21155: [SPARK-23927][SQL] Add "sequence" expression

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21155
  
**[Test build #90909 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90909/testReport)**
 for PR 21155 at commit 
[`2ccb9bb`](https://github.com/apache/spark/commit/2ccb9bb73f729c361ee819316e9c95224797a178).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90885/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20345
  
**[Test build #90885 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90885/testReport)**
 for PR 20345 at commit 
[`94d9171`](https://github.com/apache/spark/commit/94d9171b8ec26c21724dd393cf4fc83ff52623e7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21236
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

2018-05-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21236
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90887/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r189567315
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.jupyter.eagerEval.enabled
+  false
+  
+Open eager evaluation on jupyter or not. If yes, dataframe will be ran 
automatically
+and html table will feedback the queries user have defined (see
+https://issues.apache.org/jira/browse/SPARK-24215;>SPARK-24215 for 
more details).
+  
+
+
+  spark.jupyter.default.showRows
--- End diff --

change to spark.jupyter.eagerEval.showRows,thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r189567259
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.jupyter.eagerEval.enabled
+  false
+  
+Open eager evaluation on jupyter or not. If yes, dataframe will be ran 
automatically
--- End diff --

Got it, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r189567350
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.jupyter.eagerEval.enabled
+  false
+  
+Open eager evaluation on jupyter or not. If yes, dataframe will be ran 
automatically
+and html table will feedback the queries user have defined (see
+https://issues.apache.org/jira/browse/SPARK-24215;>SPARK-24215 for 
more details).
+  
+
+
+  spark.jupyter.default.showRows
+  20
+  
+Default number of rows in jupyter html table.
+  
+
+
+  spark.jupyter.default.truncate
--- End diff --

Yep, change to spark.jupyter.eagerEval.truncate


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

2018-05-21 Thread sergey-rubtsov

Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/21363#discussion_r189583023
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -90,6 +90,7 @@ private[csv] object CSVInferSchema {
   // DecimalTypes have different precisions and scales, so we try 
to find the common type.
   findTightestCommonType(typeSoFar, tryParseDecimal(field, 
options)).getOrElse(StringType)
 case DoubleType => tryParseDouble(field, options)
+case DateType => tryParseDate(field, options)
--- End diff --

I can do it, but where exactly it should be documented?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

2018-05-21 Thread sergey-rubtsov

Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/21363#discussion_r189582909
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala
 ---
@@ -59,13 +59,21 @@ class CSVInferSchemaSuite extends SparkFunSuite {
 assert(CSVInferSchema.inferField(IntegerType, textValueOne, options) 
== expectedTypeOne)
   }
 
-  test("Timestamp field types are inferred correctly via custom data 
format") {
-var options = new CSVOptions(Map("timestampFormat" -> "-mm"), 
"GMT")
+  test("Timestamp field types are inferred correctly via custom date 
format") {
+var options = new CSVOptions(Map("timestampFormat" -> "-MM"), 
"GMT")
--- End diff --

"-mm" means years and minutes, this is date format, this is time format
-MM" means years and months, but I do not insist on this change


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-21 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/21317
  
@thanx @liyinan926 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21266
  
**[Test build #90884 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90884/testReport)**
 for PR 21266 at commit 
[`d8c308f`](https://github.com/apache/spark/commit/d8c308fa43a001328b8645e0d339875342c25c67).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

2018-05-21 Thread sergey-rubtsov

Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/21363#discussion_r189588135
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -140,14 +141,23 @@ private[csv] object CSVInferSchema {
   private def tryParseDouble(field: String, options: CSVOptions): DataType 
= {
 if ((allCatch opt field.toDouble).isDefined || isInfOrNan(field, 
options)) {
   DoubleType
+} else {
+  tryParseDate(field, options)
--- End diff --

For example, by mistake we have identical "timestampFormat" and 
"dateFormat" options.
Let it be "-MM-dd"
'TimestampType' (8 bytes) is larger than 'DateType' (4 bytes)
So if they can overlap, we need to try parse it as date firstly, because 
both of these types are suitable, but you need to try to use a more compact by 
default and it will be correct inferring of type


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 562 matches

Mail list logo