[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21165 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90891/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21165 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21165 **[Test build #90891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90891/testReport)** for PR 21165 at commit [`74911b7`](https://github.com/apache/spark/commit/74911b7a8d7714618ab060b3227e33505b0c5d05). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21067: [SPARK-23980][K8S] Resilient Spark driver on Kubernetes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21067 **[Test build #90895 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90895/testReport)** for PR 21067 at commit [`95f6886`](https://github.com/apache/spark/commit/95f6886a29ed09eeeb0254c5289fb832328f1581). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/21370 Thanks all reviewer's comments, I address all comments in this commit. Please have a look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21381: refactor ExecuteWriteTask
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21381 refactor ExecuteWriteTask ## What changes were proposed in this pull request? As I am working on File data source V2 write path [in my repo ](https://github.com/gengliangwang/spark/blob/47f39e1f54bc748e116ae9580413fae317898327/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileSourceWriter.scala#L78), I find it essential to refactor ExecuteWriteTask in FileFormatWriter with DataWriter of Data source V2: 1. Reuse the code in both `FileFormat` and Data Source V2 2. Better abstraction, callers only need to call `commit()` or `abort` at the end of task. Also there is less code in `SingleDirectoryWriteTask` and `DynamicPartitionWriteTask`. This PR is part of data source V2 migration. Definitions of related classes is moved to a new file, and `ExecuteWriteTask` is rename to `FileFormatDataWriter` ## How was this patch tested? Existing unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark refactorExecuteWriteTask Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21381.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21381 commit cbd4ce2959bdfe63dff32d0c36b2982fcde22aac Author: Gengliang WangDate: 2018-05-21T12:16:14Z refactor ExecuteWriteTask --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...
Github user sergey-rubtsov commented on a diff in the pull request: https://github.com/apache/spark/pull/21363#discussion_r189597989 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -140,14 +141,23 @@ private[csv] object CSVInferSchema { private def tryParseDouble(field: String, options: CSVOptions): DataType = { if ((allCatch opt field.toDouble).isDefined || isInfOrNan(field, options)) { DoubleType +} else { + tryParseDate(field, options) --- End diff -- At the moment, DateType here is ignored at all, I'm not sure that it was conceived when the type was created --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21342#discussion_r189603209 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -111,12 +112,18 @@ case class BroadcastExchangeExec( SQLMetrics.postDriverMetricUpdates(sparkContext, executionId, metrics.values.toSeq) broadcasted } catch { + // SPARK-24294: To bypass scala bug: https://github.com/scala/bug/issues/9554, we throw + // SparkFatalException, which is a subclass of Exception. ThreadUtils.awaitResult + // will catch this exception and re-throw the wrapped fatal throwable. case oe: OutOfMemoryError => --- End diff -- not related to this PR, but I'm a little worried about catching OOM here. Spark has `SparkOutOfMemoryError`, and it seems more reasonable to catch `SparkOutOfMemoryError`. This can be fixed in another PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189606444 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -292,31 +297,25 @@ class Dataset[T] private[sql]( } // Create SeparateLine - val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() + val sep: String = if (html) { +// Initial append table label +sb.append("\n") +"\n" + } else { +colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() + } // column names - rows.head.zipWithIndex.map { case (cell, i) => -if (truncate > 0) { - StringUtils.leftPad(cell, colWidths(i)) -} else { - StringUtils.rightPad(cell, colWidths(i)) -} - }.addString(sb, "|", "|", "|\n") - + appendRowString(rows.head, truncate, colWidths, html, true, sb) sb.append(sep) // data - rows.tail.foreach { -_.zipWithIndex.map { case (cell, i) => - if (truncate > 0) { -StringUtils.leftPad(cell.toString, colWidths(i)) - } else { -StringUtils.rightPad(cell.toString, colWidths(i)) - } -}.addString(sb, "|", "|", "|\n") + rows.tail.foreach { row => +appendRowString(row.map(_.toString), truncate, colWidths, html, false, sb) --- End diff -- I know this is not your change, but the `rows` is already `Seq[Seq[String]]` and the `row` is `Seq[String]`, so I think we can remove it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189606746 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.jupyter.eagerEval.enabled --- End diff -- Oh, yes, I'd prefer to add `sql`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189606695 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -358,6 +357,43 @@ class Dataset[T] private[sql]( sb.toString() } + /** + * Transform current row string and append to builder + * + * @param row Current row of string + * @param truncate If set to more than 0, truncates strings to `truncate` characters and + *all cells will be aligned right. + * @param colWidths The width of each column + * @param html If set to true, return output as html table. + * @param head Set to true while current row is table head. + * @param sbStringBuilder for current row. + */ + private[sql] def appendRowString( + row: Seq[String], + truncate: Int, + colWidths: Array[Int], + html: Boolean, + head: Boolean, + sb: StringBuilder): Unit = { +val data = row.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell, colWidths(i)) + } else { +StringUtils.rightPad(cell, colWidths(i)) + } +} +(html, head) match { + case (true, true) => +data.map(StringEscapeUtils.escapeHtml).addString( + sb, "", "", "") --- End diff -- Hmm, the header looks okay, but the data section will be a long line without `\n`? How about adding `\n` here and the data section, and just using `""` for the seperatedLine? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21381 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21381 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90897/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21363 @sergey-rubtsov we have to keep backward compatibility. If a user upgrades, with your change a running application may break because of data not being anymore timestamp but date. We can add a new entry in `SQLConf` for that. Moreover, we have to mention in the migration guide all the behavioral changes we introduce. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...
Github user squito commented on the issue: https://github.com/apache/spark/pull/21068 ping @tgravescs . honestly I still don't love the blacklist limit, especially since it makes reporting back to the driver pretty confusing, and I don't think it buys us much. But I can live with it. and otherwise I think this is ready. I've also looked at Attila's tests on a real cluster --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20636 ping @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21370 **[Test build #90896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90896/testReport)** for PR 21370 at commit [`a798cf2`](https://github.com/apache/spark/commit/a798cf20d20b9b0f899ce477a56502549672482a). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189603851 --- Diff: python/pyspark/sql/dataframe.py --- @@ -347,13 +347,26 @@ def show(self, n=20, truncate=True, vertical=False): name | Bob """ if isinstance(truncate, bool) and truncate: -print(self._jdf.showString(n, 20, vertical)) +print(self._jdf.showString(n, 20, vertical, False)) else: -print(self._jdf.showString(n, int(truncate), vertical)) +print(self._jdf.showString(n, int(truncate), vertical, False)) def __repr__(self): return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +def _repr_html_(self): --- End diff -- No problem, is the SQLTests in pyspark/sql/tests.py the right place? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189614067 --- Diff: python/pyspark/sql/dataframe.py --- @@ -347,13 +347,26 @@ def show(self, n=20, truncate=True, vertical=False): name | Bob """ if isinstance(truncate, bool) and truncate: -print(self._jdf.showString(n, 20, vertical)) +print(self._jdf.showString(n, 20, vertical, False)) else: -print(self._jdf.showString(n, int(truncate), vertical)) +print(self._jdf.showString(n, int(truncate), vertical, False)) def __repr__(self): return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +def _repr_html_(self): --- End diff -- No problem, I'll added in `SQLTests` in next commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189614136 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.jupyter.eagerEval.enabled --- End diff -- Got it, fix it in next commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3419/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21342#discussion_r189632062 --- Diff: core/src/main/scala/org/apache/spark/util/SparkFatalException.scala --- @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util + +/** + * SPARK-24294: To bypass scala bug: https://github.com/scala/bug/issues/9554, we catch + * fatal throwable in {@link scala.concurrent.Future}'s body, and re-throw + * SparkFatalException, which wraps the fatal throwable inside. + */ +private[spark] final class SparkFatalException(val throwable: Throwable) extends Exception --- End diff -- I believe it will not. The `SparkFatalException` has a short life cycle: it is created inside `Future {}` and then caught and stripped by `ThreadUtils.awaitResult`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21288#discussion_r189635143 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -0,0 +1,437 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import java.io.File + +import scala.util.{Random, Try} + +import org.apache.spark.SparkConf +import org.apache.spark.sql.{DataFrame, SparkSession} +import org.apache.spark.sql.functions.monotonically_increasing_id +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.{Benchmark, Utils} + + +/** + * Benchmark to measure read performance with Filter pushdown. + * To run this: + * spark-submit --class + */ +object FilterPushdownBenchmark { + val conf = new SparkConf() +.setAppName("FilterPushdownBenchmark") +.setIfMissing("spark.master", "local[1]") +.setIfMissing("spark.driver.memory", "3g") +.setIfMissing("spark.executor.memory", "3g") +.setIfMissing("orc.compression", "snappy") +.setIfMissing("spark.sql.parquet.compression.codec", "snappy") + + private val spark = SparkSession.builder().config(conf).getOrCreate() + + def withTempPath(f: File => Unit): Unit = { +val path = Utils.createTempDir() +path.delete() +try f(path) finally Utils.deleteRecursively(path) + } + + def withTempTable(tableNames: String*)(f: => Unit): Unit = { +try f finally tableNames.foreach(spark.catalog.dropTempView) + } + + def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = { +val (keys, values) = pairs.unzip +val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption) +(keys, values).zipped.foreach(spark.conf.set) +try f finally { + keys.zip(currentValues).foreach { +case (key, Some(value)) => spark.conf.set(key, value) +case (key, None) => spark.conf.unset(key) + } +} + } + + private def prepareTable( + dir: File, numRows: Int, width: Int, useStringForValue: Boolean): Unit = { +import spark.implicits._ +val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i") +val valueCol = if (useStringForValue) { + monotonically_increasing_id().cast("string") +} else { + monotonically_increasing_id() +} +val df = spark.range(numRows).map(_ => Random.nextLong).selectExpr(selectExpr: _*) + .withColumn("value", valueCol) + .sort("value") + +saveAsOrcTable(df, dir.getCanonicalPath + "/orc") +saveAsParquetTable(df, dir.getCanonicalPath + "/parquet") + } + + private def prepareStringDictTable( + dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = { +val selectExpr = (0 to width).map { + case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value" + case i => s"CAST(rand() AS STRING) c$i" +} +val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value") + +saveAsOrcTable(df, dir.getCanonicalPath + "/orc") +saveAsParquetTable(df, dir.getCanonicalPath + "/parquet") + } + + private def saveAsOrcTable(df: DataFrame, dir: String): Unit = { +df.write.mode("overwrite").orc(dir) +spark.read.orc(dir).createOrReplaceTempView("orcTable") + } + + private def saveAsParquetTable(df: DataFrame, dir: String): Unit = { +df.write.mode("overwrite").parquet(dir) +spark.read.parquet(dir).createOrReplaceTempView("parquetTable") + } + + def filterPushDownBenchmark( + values: Int, + title: String, + whereExpr: String, + selectExpr: String = "*"): Unit = { +val benchmark = new Benchmark(title, values, minNumIters = 5) + +Seq(false, true).foreach { pushDownEnabled => + val
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21236 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21361: [SPARK-24313][SQL] Fix collection operations' interprete...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21361 **[Test build #90903 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90903/testReport)** for PR 21361 at commit [`6315775`](https://github.com/apache/spark/commit/63157756fde2b4e9dc80fce0296da68067c57422). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21356: [SPARK-24309][CORE] AsyncEventQueue should stop o...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21356#discussion_r189604443 --- Diff: core/src/main/scala/org/apache/spark/util/ListenerBus.scala --- @@ -80,7 +89,16 @@ private[spark] trait ListenerBus[L <: AnyRef, E] extends Logging { } try { doPostEvent(listener, event) +if (Thread.interrupted()) { + logError(s"Interrupted while posting to ${Utils.getFormattedClassName(listener)}. " + +s"Removing that listener.") + removeListenerOnError(listener) --- End diff -- `Thread.interrupted()` also clears the interrupted state. So that alone isn't a problem -- we're basically declaring that we've handled the interrupt and nobody else gets to know about it anymore. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189613358 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -292,31 +297,25 @@ class Dataset[T] private[sql]( } // Create SeparateLine - val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() + val sep: String = if (html) { +// Initial append table label +sb.append("\n") +"\n" + } else { +colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() + } // column names - rows.head.zipWithIndex.map { case (cell, i) => -if (truncate > 0) { - StringUtils.leftPad(cell, colWidths(i)) -} else { - StringUtils.rightPad(cell, colWidths(i)) -} - }.addString(sb, "|", "|", "|\n") - + appendRowString(rows.head, truncate, colWidths, html, true, sb) sb.append(sep) // data - rows.tail.foreach { -_.zipWithIndex.map { case (cell, i) => - if (truncate > 0) { -StringUtils.leftPad(cell.toString, colWidths(i)) - } else { -StringUtils.rightPad(cell.toString, colWidths(i)) - } -}.addString(sb, "|", "|", "|\n") + rows.tail.foreach { row => +appendRowString(row.map(_.toString), truncate, colWidths, html, false, sb) --- End diff -- I see, the `cell.toString` has been called here. https://github.com/apache/spark/pull/21370/files/f2bb8f334631734869ddf5d8ef1eca1fa29d334a#diff-7a46f10c3cedbf013cf255564d9483cdR271 Got it, I'll fix this in next commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21193: [SPARK-24121][SQL] Add API for handling expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21193#discussion_r189617272 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -167,9 +170,40 @@ object Block { case other => throw new IllegalArgumentException( s"Can not interpolate ${other.getClass.getName} into code block.") } -CodeBlock(sc.parts, args) + +val (codeParts, blockInputs) = foldLiteralArgs(sc.parts, args) +CodeBlock(codeParts, blockInputs) + } +} + } + + // Folds eagerly the literal args into the code parts. + private def foldLiteralArgs(parts: Seq[String], args: Seq[Any]): (Seq[String], Seq[Any]) = { +val codeParts = ArrayBuffer.empty[String] +val blockInputs = ArrayBuffer.empty[Any] + +val strings = parts.iterator +val inputs = args.iterator +val buf = new StringBuilder(Block.CODE_BLOCK_BUFFER_LENGTH) + +buf append strings.next --- End diff -- can we use java style here? `buf.append(strings.next)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21342#discussion_r189627101 --- Diff: core/src/main/scala/org/apache/spark/util/SparkFatalException.scala --- @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util + +/** + * SPARK-24294: To bypass scala bug: https://github.com/scala/bug/issues/9554, we catch + * fatal throwable in {@link scala.concurrent.Future}'s body, and re-throw + * SparkFatalException, which wraps the fatal throwable inside. + */ +private[spark] final class SparkFatalException(val throwable: Throwable) extends Exception --- End diff -- Will this change impact https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala#L42 ? cc @zsxwing @JoshRosen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21342#discussion_r189634704 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -111,12 +112,18 @@ case class BroadcastExchangeExec( SQLMetrics.postDriverMetricUpdates(sparkContext, executionId, metrics.values.toSeq) broadcasted } catch { + // SPARK-24294: To bypass scala bug: https://github.com/scala/bug/issues/9554, we throw + // SparkFatalException, which is a subclass of Exception. ThreadUtils.awaitResult + // will catch this exception and re-throw the wrapped fatal throwable. case oe: OutOfMemoryError => -throw new OutOfMemoryError(s"Not enough memory to build and broadcast the table to " + +throw new SparkFatalException( + new OutOfMemoryError(s"Not enough memory to build and broadcast the table to " + --- End diff -- Just curious: Can we perform object operations (allocate `OutOfMemoryError`, allocate and concatenate `String`s) when we caught ` OutOfMemoryError`? I think that we have space since we failed to allocate a large object. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r189642632 --- Diff: pom.xml --- @@ -150,6 +150,7 @@ 4.5.4 4.4.8 +3.0.1 --- End diff -- These are data structures optimized for storing primitives. We could use standard Scala here functionally speaking. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21382: [SPARK-24332][SS][MESOS]Fix places reading 'spark...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/21382 [SPARK-24332][SS][MESOS]Fix places reading 'spark.network.timeout' as milliseconds ## What changes were proposed in this pull request? This PR replaces `getTimeAsMs` with `getTimeAsSeconds` to fix the issue that reading "spark.network.timeout" using a wrong time unit when the user doesn't specify a time out. ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark fix-network-timeout-conf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21382.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21382 commit 145dd3b2a7a696d13c0d6286029f0ae47ba96d3b Author: Shixiong ZhuDate: 2018-05-21T16:34:48Z Fix places reading 'spark.network.timeout' as milliseconds --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3311/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...
Github user ah- commented on the issue: https://github.com/apache/spark/pull/20272 Is anyone still looking at this? It seems like it should just work? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19691 the failed UTs are valid failures --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21288#discussion_r189639582 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -0,0 +1,437 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import java.io.File + +import scala.util.{Random, Try} + +import org.apache.spark.SparkConf +import org.apache.spark.sql.{DataFrame, SparkSession} +import org.apache.spark.sql.functions.monotonically_increasing_id +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.{Benchmark, Utils} + + +/** + * Benchmark to measure read performance with Filter pushdown. + * To run this: + * spark-submit --class + */ +object FilterPushdownBenchmark { + val conf = new SparkConf() +.setAppName("FilterPushdownBenchmark") +.setIfMissing("spark.master", "local[1]") +.setIfMissing("spark.driver.memory", "3g") +.setIfMissing("spark.executor.memory", "3g") +.setIfMissing("orc.compression", "snappy") +.setIfMissing("spark.sql.parquet.compression.codec", "snappy") + + private val spark = SparkSession.builder().config(conf).getOrCreate() + + def withTempPath(f: File => Unit): Unit = { +val path = Utils.createTempDir() +path.delete() +try f(path) finally Utils.deleteRecursively(path) + } + + def withTempTable(tableNames: String*)(f: => Unit): Unit = { +try f finally tableNames.foreach(spark.catalog.dropTempView) + } + + def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = { +val (keys, values) = pairs.unzip +val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption) +(keys, values).zipped.foreach(spark.conf.set) +try f finally { + keys.zip(currentValues).foreach { +case (key, Some(value)) => spark.conf.set(key, value) +case (key, None) => spark.conf.unset(key) + } +} + } + + private def prepareTable( + dir: File, numRows: Int, width: Int, useStringForValue: Boolean): Unit = { +import spark.implicits._ +val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i") +val valueCol = if (useStringForValue) { + monotonically_increasing_id().cast("string") +} else { + monotonically_increasing_id() +} +val df = spark.range(numRows).map(_ => Random.nextLong).selectExpr(selectExpr: _*) + .withColumn("value", valueCol) + .sort("value") + +saveAsOrcTable(df, dir.getCanonicalPath + "/orc") +saveAsParquetTable(df, dir.getCanonicalPath + "/parquet") + } + + private def prepareStringDictTable( + dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = { +val selectExpr = (0 to width).map { + case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value" + case i => s"CAST(rand() AS STRING) c$i" +} +val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value") + +saveAsOrcTable(df, dir.getCanonicalPath + "/orc") +saveAsParquetTable(df, dir.getCanonicalPath + "/parquet") + } + + private def saveAsOrcTable(df: DataFrame, dir: String): Unit = { +df.write.mode("overwrite").orc(dir) +spark.read.orc(dir).createOrReplaceTempView("orcTable") + } + + private def saveAsParquetTable(df: DataFrame, dir: String): Unit = { +df.write.mode("overwrite").parquet(dir) +spark.read.parquet(dir).createOrReplaceTempView("parquetTable") + } + + def filterPushDownBenchmark( + values: Int, + title: String, + whereExpr: String, + selectExpr: String = "*"): Unit = { +val benchmark = new Benchmark(title, values, minNumIters = 5) + +Seq(false, true).foreach { pushDownEnabled => + val
[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3420/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21381 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90902/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21317: [SPARK-24232][k8s] Add support for secret env var...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21317#discussion_r189643350 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/EnvSecretsFeatureStepSuite.scala --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.PodBuilder + +import org.apache.spark.{SparkConf, SparkFunSuite} +import org.apache.spark.deploy.k8s._ + +class EnvSecretsFeatureStepSuite extends SparkFunSuite{ + private val KEY_REF_NAME_FOO = "foo" + private val KEY_REF_NAME_BAR = "bar" + private val KEY_REF_KEY_FOO = "key_foo" + private val KEY_REF_KEY_BAR = "key_bar" + private val ENV_NAME_FOO = "MY_FOO" + private val ENV_NAME_BAR = "MY_bar" + + test("sets up all keyRefs") { +val baseDriverPod = SparkPod.initialPod() +val envVarsToKeys = Map( + ENV_NAME_BAR -> s"${KEY_REF_NAME_BAR}:${KEY_REF_KEY_BAR}", + ENV_NAME_FOO -> s"${KEY_REF_NAME_FOO}:${KEY_REF_KEY_FOO}") +val sparkConf = new SparkConf(false) +val kubernetesConf = KubernetesConf( + sparkConf, + KubernetesExecutorSpecificConf("1", new PodBuilder().build()), + "resource-name-prefix", + "app-id", + Map.empty, + Map.empty, + Map.empty, + envVarsToKeys, + Map.empty) + +val step = new EnvSecretsFeatureStep(kubernetesConf) +val driverContainerWithEnvSecrets = step.configurePod(baseDriverPod).container + +val expectedVars = + Seq(s"${ENV_NAME_BAR}", s"${ENV_NAME_FOO}") + +expectedVars.foreach { envName => + assert(SecretEnvUtils.containerHasEnvVar(driverContainerWithEnvSecrets, envName)) --- End diff -- It's better if the bests also verify the values of the env vars. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21317: [SPARK-24232][k8s] Add support for secret env var...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21317#discussion_r189639396 --- Diff: docs/running-on-kubernetes.md --- @@ -140,6 +140,13 @@ namespace as that of the driver and executor pods. For example, to mount a secre --conf spark.kubernetes.executor.secrets.spark-secret=/etc/secrets ``` +To add a secret as an env variable to either the driver or the executor container use the following options to the --- End diff -- s/`To add a secret as an ...` /`To use a secret through an environment variable`/. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21381 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21381 **[Test build #90902 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90902/testReport)** for PR 21381 at commit [`28e33ff`](https://github.com/apache/spark/commit/28e33fffa4d7fb1a747e1f0d15b46ca08be23dbb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21317: [SPARK-24232][k8s] Add support for secret env var...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21317#discussion_r189640364 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/EnvSecretsFeatureStep.scala --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.{ContainerBuilder, EnvVarBuilder, HasMetadata} + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesRoleSpecificConf, SparkPod} + +private[spark] class EnvSecretsFeatureStep( +kubernetesConf: KubernetesConf[_ <: KubernetesRoleSpecificConf]) + extends KubernetesFeatureConfigStep { + override def configurePod(pod: SparkPod): SparkPod = { +val addedEnvSecrets = kubernetesConf + .roleSecretEnvNamesToKeyRefs + .map{ case (envName, keyRef) => +// Keyref parts +val keyRefParts = keyRef.split(":") +require(keyRefParts.size == 2, "KeyRef must be in the form name:key.") +val name = keyRefParts(0) +val key = keyRefParts(1) +new EnvVarBuilder() + .withName(envName) + .withNewValueFrom() + .withNewSecretKeyRef() + .withKey(key) --- End diff -- The indention seems incorrect. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21317: [SPARK-24232][k8s] Add support for secret env var...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21317#discussion_r189642295 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/EnvSecretsFeatureStep.scala --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.{ContainerBuilder, EnvVarBuilder, HasMetadata} + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesRoleSpecificConf, SparkPod} + +private[spark] class EnvSecretsFeatureStep( +kubernetesConf: KubernetesConf[_ <: KubernetesRoleSpecificConf]) + extends KubernetesFeatureConfigStep { + override def configurePod(pod: SparkPod): SparkPod = { +val addedEnvSecrets = kubernetesConf + .roleSecretEnvNamesToKeyRefs + .map{ case (envName, keyRef) => +// Keyref parts +val keyRefParts = keyRef.split(":") +require(keyRefParts.size == 2, "KeyRef must be in the form name:key.") --- End diff -- s/`KeyRef`/`SecretKeyRef`/. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21381 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3421/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21356: [SPARK-24309][CORE] AsyncEventQueue should stop o...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21356#discussion_r189647689 --- Diff: core/src/main/scala/org/apache/spark/util/ListenerBus.scala --- @@ -80,7 +89,16 @@ private[spark] trait ListenerBus[L <: AnyRef, E] extends Logging { } try { doPostEvent(listener, event) +if (Thread.interrupted()) { + logError(s"Interrupted while posting to ${Utils.getFormattedClassName(listener)}. " + +s"Removing that listener.") + removeListenerOnError(listener) --- End diff -- You could just throw `InterruptedException` here to avoid duplicating the error handling code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3311/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3416/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21381 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21361: [SPARK-24313][SQL] Fix collection operations' interprete...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21361 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3417/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21361: [SPARK-24313][SQL] Fix collection operations' interprete...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21361 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21288 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21372 Sure, @maropu . In addition, I reviewed the nine patches, almost trivial ones. I'll update the PR description more. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21366 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21372#discussion_r189644114 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java --- @@ -136,7 +136,7 @@ public int getInt(int rowId) { public long getLong(int rowId) { int index = getRowIndex(rowId); if (isTimestamp) { - return timestampData.time[index] * 1000 + timestampData.nanos[index] / 1000; + return timestampData.time[index] * 1000 + timestampData.nanos[index] / 1000 % 1000; --- End diff -- Add a test case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21370 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90896/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21370 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21342: [SPARK-24294] Throw SparkException when OOM in Broadcast...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21342 LGTM, which scala version has fixed this bug? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19691 **[Test build #90894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90894/testReport)** for PR 19691 at commit [`dd5d482`](https://github.com/apache/spark/commit/dd5d4825d247d9949eb2bf10c05ee7661c389709). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19691 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90894/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19691 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21288#discussion_r189638131 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -0,0 +1,437 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import java.io.File + +import scala.util.{Random, Try} + +import org.apache.spark.SparkConf +import org.apache.spark.sql.{DataFrame, SparkSession} +import org.apache.spark.sql.functions.monotonically_increasing_id +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.{Benchmark, Utils} + + +/** + * Benchmark to measure read performance with Filter pushdown. + * To run this: + * spark-submit --class + */ +object FilterPushdownBenchmark { + val conf = new SparkConf() +.setAppName("FilterPushdownBenchmark") +.setIfMissing("spark.master", "local[1]") +.setIfMissing("spark.driver.memory", "3g") +.setIfMissing("spark.executor.memory", "3g") +.setIfMissing("orc.compression", "snappy") +.setIfMissing("spark.sql.parquet.compression.codec", "snappy") + + private val spark = SparkSession.builder().config(conf).getOrCreate() + + def withTempPath(f: File => Unit): Unit = { +val path = Utils.createTempDir() +path.delete() +try f(path) finally Utils.deleteRecursively(path) + } + + def withTempTable(tableNames: String*)(f: => Unit): Unit = { +try f finally tableNames.foreach(spark.catalog.dropTempView) + } + + def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = { +val (keys, values) = pairs.unzip +val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption) +(keys, values).zipped.foreach(spark.conf.set) +try f finally { + keys.zip(currentValues).foreach { +case (key, Some(value)) => spark.conf.set(key, value) +case (key, None) => spark.conf.unset(key) + } +} + } + + private def prepareTable( + dir: File, numRows: Int, width: Int, useStringForValue: Boolean): Unit = { +import spark.implicits._ +val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i") +val valueCol = if (useStringForValue) { + monotonically_increasing_id().cast("string") +} else { + monotonically_increasing_id() +} +val df = spark.range(numRows).map(_ => Random.nextLong).selectExpr(selectExpr: _*) + .withColumn("value", valueCol) + .sort("value") + +saveAsOrcTable(df, dir.getCanonicalPath + "/orc") +saveAsParquetTable(df, dir.getCanonicalPath + "/parquet") + } + + private def prepareStringDictTable( + dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = { +val selectExpr = (0 to width).map { + case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value" + case i => s"CAST(rand() AS STRING) c$i" +} +val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value") + +saveAsOrcTable(df, dir.getCanonicalPath + "/orc") +saveAsParquetTable(df, dir.getCanonicalPath + "/parquet") + } + + private def saveAsOrcTable(df: DataFrame, dir: String): Unit = { +df.write.mode("overwrite").orc(dir) +spark.read.orc(dir).createOrReplaceTempView("orcTable") + } + + private def saveAsParquetTable(df: DataFrame, dir: String): Unit = { +df.write.mode("overwrite").parquet(dir) +spark.read.parquet(dir).createOrReplaceTempView("parquetTable") + } + + def filterPushDownBenchmark( + values: Int, + title: String, + whereExpr: String, + selectExpr: String = "*"): Unit = { +val benchmark = new Benchmark(title, values, minNumIters = 5) + +Seq(false, true).foreach { pushDownEnabled => +
[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21376 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21381 **[Test build #90906 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90906/testReport)** for PR 21381 at commit [`3c29628`](https://github.com/apache/spark/commit/3c29628b2f7c9c5a817791866fa450fe4ebafced). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #90907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90907/testReport)** for PR 21366 at commit [`931529a`](https://github.com/apache/spark/commit/931529a74c07b3212dcbe34e7329384555777146). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21368: [SPARK-16451][repl] Fail shell if SparkSession fa...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21368#discussion_r189649450 --- Diff: repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala --- @@ -44,7 +44,14 @@ class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) @transient val spark = if (org.apache.spark.repl.Main.sparkSession != null) { org.apache.spark.repl.Main.sparkSession } else { -org.apache.spark.repl.Main.createSparkSession() +try { + org.apache.spark.repl.Main.createSparkSession() +} catch { + case e: Exception => +println("Failed to initialize Spark session:") +e.printStackTrace() +sys.exit(1) --- End diff -- My usual response is "this is not a public class" (it's not in the public API docs), but let me see if it's easy to restrict the `sys.exit` to spark-shell invocations. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21382: [SPARK-24332][SS][MESOS]Fix places reading 'spark.networ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21382 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3422/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21382: [SPARK-24332][SS][MESOS]Fix places reading 'spark.networ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21382 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21356: [SPARK-24309][CORE] AsyncEventQueue should stop o...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21356#discussion_r189607562 --- Diff: core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala --- @@ -111,6 +111,12 @@ private[spark] class LiveListenerBus(conf: SparkConf) { } } + private[scheduler] def removeQueue(queue: String): Unit = synchronized { --- End diff -- can we remove it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189611792 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -358,6 +357,43 @@ class Dataset[T] private[sql]( sb.toString() } + /** + * Transform current row string and append to builder + * + * @param row Current row of string + * @param truncate If set to more than 0, truncates strings to `truncate` characters and + *all cells will be aligned right. + * @param colWidths The width of each column + * @param html If set to true, return output as html table. + * @param head Set to true while current row is table head. + * @param sbStringBuilder for current row. + */ + private[sql] def appendRowString( + row: Seq[String], + truncate: Int, + colWidths: Array[Int], + html: Boolean, + head: Boolean, + sb: StringBuilder): Unit = { +val data = row.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell, colWidths(i)) + } else { +StringUtils.rightPad(cell, colWidths(i)) + } +} +(html, head) match { + case (true, true) => +data.map(StringEscapeUtils.escapeHtml).addString( + sb, "", "", "") --- End diff -- Ah, I understand your consideration. I'll add this in next commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20345 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21381 **[Test build #90897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90897/testReport)** for PR 21381 at commit [`cbd4ce2`](https://github.com/apache/spark/commit/cbd4ce2959bdfe63dff32d0c36b2982fcde22aac). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class FileFormatDataWriter(` * `class EmptyDirectoryDataWriter(` * `class SingleDirectoryDataWriter(` * `class DynamicPartitionDataWriter(` * `class WriteJobDescription(` * `case class WriteTaskResult(commitMsg: TaskCommitMessage, summary: ExecutedWriteSummary)` * `case class ExecutedWriteSummary(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21288 **[Test build #90904 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90904/testReport)** for PR 21288 at commit [`39e5a50`](https://github.com/apache/spark/commit/39e5a507fe22cade6bed0613eefbccab15cf45ff). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20345 **[Test build #90905 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90905/testReport)** for PR 20345 at commit [`94d9171`](https://github.com/apache/spark/commit/94d9171b8ec26c21724dd393cf4fc83ff52623e7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21193: [SPARK-24121][SQL] Add API for handling expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21193#discussion_r189616918 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -167,9 +170,40 @@ object Block { case other => throw new IllegalArgumentException( s"Can not interpolate ${other.getClass.getName} into code block.") } -CodeBlock(sc.parts, args) + +val (codeParts, blockInputs) = foldLiteralArgs(sc.parts, args) +CodeBlock(codeParts, blockInputs) + } +} + } + + // Folds eagerly the literal args into the code parts. + private def foldLiteralArgs(parts: Seq[String], args: Seq[Any]): (Seq[String], Seq[Any]) = { +val codeParts = ArrayBuffer.empty[String] +val blockInputs = ArrayBuffer.empty[Any] --- End diff -- shall we make the type `JavaCode` instead of `Any`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3418/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21045: [SPARK-23931][SQL] Adds zip function to sparksql
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21045 cc @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21376 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21067: [SPARK-23980][K8S] Resilient Spark driver on Kubernetes
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/21067 @foxish on concerns of the lack of exactly-one semantics. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21382: [SPARK-24332][SS][MESOS]Fix places reading 'spark.networ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21382 **[Test build #90908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90908/testReport)** for PR 21382 at commit [`145dd3b`](https://github.com/apache/spark/commit/145dd3b2a7a696d13c0d6286029f0ae47ba96d3b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21368: [SPARK-16451][repl] Fail shell if SparkSession fa...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21368#discussion_r189649746 --- Diff: repl/scala-2.12/src/main/scala/org/apache/spark/repl/SparkILoop.scala --- @@ -37,7 +37,14 @@ class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) @transient val spark = if (org.apache.spark.repl.Main.sparkSession != null) { org.apache.spark.repl.Main.sparkSession } else { -org.apache.spark.repl.Main.createSparkSession() +try { + org.apache.spark.repl.Main.createSparkSession() +} catch { + case e: Exception => +println("Failed to initialize Spark session:") +e.printStackTrace() +sys.exit(1) +} --- End diff -- I'm not that familiar with those two shells but I'll give it a try. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21368: [SPARK-16451][repl] Fail shell if SparkSession fa...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21368#discussion_r189649541 --- Diff: python/pyspark/shell.py --- @@ -38,25 +41,29 @@ SparkContext._ensure_initialized() try: -# Try to access HiveConf, it will raise exception if Hive is not added -conf = SparkConf() -if conf.get('spark.sql.catalogImplementation', 'hive').lower() == 'hive': -SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf() -spark = SparkSession.builder\ -.enableHiveSupport()\ -.getOrCreate() -else: +try: +# Try to access HiveConf, it will raise exception if Hive is not added +conf = SparkConf() +if conf.get('spark.sql.catalogImplementation', 'hive').lower() == 'hive': +SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf() +spark = SparkSession.builder\ +.enableHiveSupport()\ +.getOrCreate() +else: +spark = SparkSession.builder.getOrCreate() +except py4j.protocol.Py4JError: +if conf.get('spark.sql.catalogImplementation', '').lower() == 'hive': +warnings.warn("Fall back to non-hive support because failing to access HiveConf, " + "please make sure you build spark with hive") +spark = SparkSession.builder.getOrCreate() +except TypeError: +if conf.get('spark.sql.catalogImplementation', '').lower() == 'hive': +warnings.warn("Fall back to non-hive support because failing to access HiveConf, " + "please make sure you build spark with hive") spark = SparkSession.builder.getOrCreate() -except py4j.protocol.Py4JError: -if conf.get('spark.sql.catalogImplementation', '').lower() == 'hive': -warnings.warn("Fall back to non-hive support because failing to access HiveConf, " - "please make sure you build spark with hive") -spark = SparkSession.builder.getOrCreate() -except TypeError: -if conf.get('spark.sql.catalogImplementation', '').lower() == 'hive': -warnings.warn("Fall back to non-hive support because failing to access HiveConf, " - "please make sure you build spark with hive") -spark = SparkSession.builder.getOrCreate() +except Exception as e: +print("Failed to initialize Spark session:", e, file=sys.stderr) --- End diff -- Printing the exception shows its traceback. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21155: [SPARK-23927][SQL] Add "sequence" expression
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21155 **[Test build #90909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90909/testReport)** for PR 21155 at commit [`2ccb9bb`](https://github.com/apache/spark/commit/2ccb9bb73f729c361ee819316e9c95224797a178). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90885/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20345 **[Test build #90885 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90885/testReport)** for PR 20345 at commit [`94d9171`](https://github.com/apache/spark/commit/94d9171b8ec26c21724dd393cf4fc83ff52623e7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21236 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21236 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90887/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189567315 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.jupyter.eagerEval.enabled + false + +Open eager evaluation on jupyter or not. If yes, dataframe will be ran automatically +and html table will feedback the queries user have defined (see +https://issues.apache.org/jira/browse/SPARK-24215;>SPARK-24215 for more details). + + + + spark.jupyter.default.showRows --- End diff -- change to spark.jupyter.eagerEval.showRows,thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189567259 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.jupyter.eagerEval.enabled + false + +Open eager evaluation on jupyter or not. If yes, dataframe will be ran automatically --- End diff -- Got it, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189567350 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.jupyter.eagerEval.enabled + false + +Open eager evaluation on jupyter or not. If yes, dataframe will be ran automatically +and html table will feedback the queries user have defined (see +https://issues.apache.org/jira/browse/SPARK-24215;>SPARK-24215 for more details). + + + + spark.jupyter.default.showRows + 20 + +Default number of rows in jupyter html table. + + + + spark.jupyter.default.truncate --- End diff -- Yep, change to spark.jupyter.eagerEval.truncate --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...
Github user sergey-rubtsov commented on a diff in the pull request: https://github.com/apache/spark/pull/21363#discussion_r189583023 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -90,6 +90,7 @@ private[csv] object CSVInferSchema { // DecimalTypes have different precisions and scales, so we try to find the common type. findTightestCommonType(typeSoFar, tryParseDecimal(field, options)).getOrElse(StringType) case DoubleType => tryParseDouble(field, options) +case DateType => tryParseDate(field, options) --- End diff -- I can do it, but where exactly it should be documented? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...
Github user sergey-rubtsov commented on a diff in the pull request: https://github.com/apache/spark/pull/21363#discussion_r189582909 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala --- @@ -59,13 +59,21 @@ class CSVInferSchemaSuite extends SparkFunSuite { assert(CSVInferSchema.inferField(IntegerType, textValueOne, options) == expectedTypeOne) } - test("Timestamp field types are inferred correctly via custom data format") { -var options = new CSVOptions(Map("timestampFormat" -> "-mm"), "GMT") + test("Timestamp field types are inferred correctly via custom date format") { +var options = new CSVOptions(Map("timestampFormat" -> "-MM"), "GMT") --- End diff -- "-mm" means years and minutes, this is date format, this is time format -MM" means years and months, but I do not insist on this change --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars
Github user skonto commented on the issue: https://github.com/apache/spark/pull/21317 @thanx @liyinan926 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21266 **[Test build #90884 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90884/testReport)** for PR 21266 at commit [`d8c308f`](https://github.com/apache/spark/commit/d8c308fa43a001328b8645e0d339875342c25c67). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...
Github user sergey-rubtsov commented on a diff in the pull request: https://github.com/apache/spark/pull/21363#discussion_r189588135 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -140,14 +141,23 @@ private[csv] object CSVInferSchema { private def tryParseDouble(field: String, options: CSVOptions): DataType = { if ((allCatch opt field.toDouble).isDefined || isInfOrNan(field, options)) { DoubleType +} else { + tryParseDate(field, options) --- End diff -- For example, by mistake we have identical "timestampFormat" and "dateFormat" options. Let it be "-MM-dd" 'TimestampType' (8 bytes) is larger than 'DateType' (4 bytes) So if they can overlap, we need to try parse it as date firstly, because both of these types are suitable, but you need to try to use a more compact by default and it will be correct inferring of type --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org