[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...
Github user dongjinleekr commented on a diff in the pull request: https://github.com/apache/spark/pull/21501#discussion_r194296251 --- Diff: python/pyspark/ml/feature.py --- @@ -2610,6 +2610,9 @@ def setParams(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive Sets params for this StopWordRemover. """ kwargs = self._input_kwargs +if locale is None: +sc = SparkContext._active_spark_context +kwargs['locale'] = sc._gateway.jvm.org.spark.ml.util.LocaleUtils.getDefaultLocale() --- End diff -- @viirya You mean... `locale=SparkContext._active_spark_context.(...)`over `locale=None` with ugly if statement, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21514: [SPARK-22860] [Core] - hide key password from lin...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21514#discussion_r194296152 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala --- @@ -100,7 +100,7 @@ private[spark] class StandaloneSchedulerBackend( val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf) val javaOpts = sparkJavaOpts ++ extraJavaOpts val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend", - args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts) + args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts.filterNot(_.startsWith("-Dspark.ssl.keyStorePassword")).filterNot(_.startsWith("-Dspark.ssl.keyPassword"))) --- End diff -- If you really have to do this, I'd have: ``` javaOpts.filterNot { opt => opt.startsWith("-Dspark.ssl.keyStorePassword") || opt.startsWith("-Dspark.ssl.keyPassword") } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21514: [SPARK-22860] [Core] - hide key password from linux ps l...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21514 Have you tried the config "spark.redaction.regex" ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21506: [SPARK-24485][SS] Measure and log elapsed time for files...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21506 **[Test build #91649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91649/testReport)** for PR 21506 at commit [`3d0e23f`](https://github.com/apache/spark/commit/3d0e23f7460976a33d6f86178d04f04e488bfaa8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21506: [SPARK-24485][SS] Measure and log elapsed time fo...
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21506#discussion_r194295068 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -280,38 +278,49 @@ private[state] class HDFSBackedStateStoreProvider extends StateStoreProvider wit if (loadedCurrentVersionMap.isDefined) { return loadedCurrentVersionMap.get } -val snapshotCurrentVersionMap = readSnapshotFile(version) -if (snapshotCurrentVersionMap.isDefined) { - synchronized { loadedMaps.put(version, snapshotCurrentVersionMap.get) } - return snapshotCurrentVersionMap.get -} -// Find the most recent map before this version that we can. -// [SPARK-22305] This must be done iteratively to avoid stack overflow. -var lastAvailableVersion = version -var lastAvailableMap: Option[MapType] = None -while (lastAvailableMap.isEmpty) { - lastAvailableVersion -= 1 +logWarning(s"The state for version $version doesn't exist in loadedMaps. " + + "Reading snapshot file and delta files if needed..." + + "Note that this is normal for the first batch of starting query.") - if (lastAvailableVersion <= 0) { -// Use an empty map for versions 0 or less. -lastAvailableMap = Some(new MapType) - } else { -lastAvailableMap = - synchronized { loadedMaps.get(lastAvailableVersion) } -.orElse(readSnapshotFile(lastAvailableVersion)) +val (result, elapsedMs) = Utils.timeTakenMs { + val snapshotCurrentVersionMap = readSnapshotFile(version) + if (snapshotCurrentVersionMap.isDefined) { +synchronized { loadedMaps.put(version, snapshotCurrentVersionMap.get) } +return snapshotCurrentVersionMap.get + } + + // Find the most recent map before this version that we can. + // [SPARK-22305] This must be done iteratively to avoid stack overflow. + var lastAvailableVersion = version + var lastAvailableMap: Option[MapType] = None + while (lastAvailableMap.isEmpty) { +lastAvailableVersion -= 1 + +if (lastAvailableVersion <= 0) { + // Use an empty map for versions 0 or less. + lastAvailableMap = Some(new MapType) +} else { + lastAvailableMap = +synchronized { loadedMaps.get(lastAvailableVersion) } + .orElse(readSnapshotFile(lastAvailableVersion)) +} + } + + // Load all the deltas from the version after the last available one up to the target version. + // The last available version is the one with a full snapshot, so it doesn't need deltas. + val resultMap = new MapType(lastAvailableMap.get) + for (deltaVersion <- lastAvailableVersion + 1 to version) { +updateFromDeltaFile(deltaVersion, resultMap) } -} -// Load all the deltas from the version after the last available one up to the target version. -// The last available version is the one with a full snapshot, so it doesn't need deltas. -val resultMap = new MapType(lastAvailableMap.get) -for (deltaVersion <- lastAvailableVersion + 1 to version) { - updateFromDeltaFile(deltaVersion, resultMap) + synchronized { loadedMaps.put(version, resultMap) } + resultMap } -synchronized { loadedMaps.put(version, resultMap) } -resultMap +logWarning(s"Loading state for $version takes $elapsedMs ms.") --- End diff -- Changed log level to DEBUG. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21505: [SPARK-24457][SQL] Improving performance of strin...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21505#discussion_r194294961 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -125,7 +125,6 @@ object DateTimeUtils { .getOrElseUpdate(timeZone, { Calendar.getInstance(timeZone) }) -c.clear() --- End diff -- Seems `setTimeInMillis` can result in all fields set. If so, `clear` is redundant. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21501 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21501 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91648/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21501 **[Test build #91648 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91648/testReport)** for PR 21501 at commit [`b4249c3`](https://github.com/apache/spark/commit/b4249c342a92dc840a1f1d5290c24a5fe165417d). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21495: [SPARK-24418][Build] Upgrade Scala to 2.11.12 and...
Github user som-snytt commented on a diff in the pull request: https://github.com/apache/spark/pull/21495#discussion_r194294485 --- Diff: repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoopInterpreter.scala --- @@ -21,8 +21,22 @@ import scala.collection.mutable import scala.tools.nsc.Settings import scala.tools.nsc.interpreter._ -class SparkILoopInterpreter(settings: Settings, out: JPrintWriter) extends IMain(settings, out) { - self => +class SparkILoopInterpreter(settings: Settings, out: JPrintWriter, initializeSpark: () => Unit) +extends IMain(settings, out) { self => --- End diff -- It's definitely two spaces after a period. I've been wanting to make that joke, but held off. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21505: [SPARK-24457][SQL] Improving performance of strin...
Github user ssonker commented on a diff in the pull request: https://github.com/apache/spark/pull/21505#discussion_r194294182 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -125,7 +125,6 @@ object DateTimeUtils { .getOrElseUpdate(timeZone, { Calendar.getInstance(timeZone) }) -c.clear() --- End diff -- @viirya @kiszk Do you agree with this commit? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21501#discussion_r194293510 --- Diff: python/pyspark/ml/feature.py --- @@ -2610,6 +2610,9 @@ def setParams(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive Sets params for this StopWordRemover. """ kwargs = self._input_kwargs +if locale is None: +sc = SparkContext._active_spark_context +kwargs['locale'] = sc._gateway.jvm.org.spark.ml.util.LocaleUtils.getDefaultLocale() --- End diff -- We can keep this default local, and use it many times instead of call to JVM form Python every time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21506: [SPARK-24485][SS] Measure and log elapsed time fo...
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21506#discussion_r194293481 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -280,38 +278,49 @@ private[state] class HDFSBackedStateStoreProvider extends StateStoreProvider wit if (loadedCurrentVersionMap.isDefined) { return loadedCurrentVersionMap.get } -val snapshotCurrentVersionMap = readSnapshotFile(version) -if (snapshotCurrentVersionMap.isDefined) { - synchronized { loadedMaps.put(version, snapshotCurrentVersionMap.get) } - return snapshotCurrentVersionMap.get -} -// Find the most recent map before this version that we can. -// [SPARK-22305] This must be done iteratively to avoid stack overflow. -var lastAvailableVersion = version -var lastAvailableMap: Option[MapType] = None -while (lastAvailableMap.isEmpty) { - lastAvailableVersion -= 1 +logWarning(s"The state for version $version doesn't exist in loadedMaps. " + + "Reading snapshot file and delta files if needed..." + + "Note that this is normal for the first batch of starting query.") - if (lastAvailableVersion <= 0) { -// Use an empty map for versions 0 or less. -lastAvailableMap = Some(new MapType) - } else { -lastAvailableMap = - synchronized { loadedMaps.get(lastAvailableVersion) } -.orElse(readSnapshotFile(lastAvailableVersion)) +val (result, elapsedMs) = Utils.timeTakenMs { + val snapshotCurrentVersionMap = readSnapshotFile(version) + if (snapshotCurrentVersionMap.isDefined) { +synchronized { loadedMaps.put(version, snapshotCurrentVersionMap.get) } +return snapshotCurrentVersionMap.get + } + + // Find the most recent map before this version that we can. + // [SPARK-22305] This must be done iteratively to avoid stack overflow. + var lastAvailableVersion = version + var lastAvailableMap: Option[MapType] = None + while (lastAvailableMap.isEmpty) { +lastAvailableVersion -= 1 + +if (lastAvailableVersion <= 0) { + // Use an empty map for versions 0 or less. + lastAvailableMap = Some(new MapType) +} else { + lastAvailableMap = +synchronized { loadedMaps.get(lastAvailableVersion) } + .orElse(readSnapshotFile(lastAvailableVersion)) +} + } + + // Load all the deltas from the version after the last available one up to the target version. + // The last available version is the one with a full snapshot, so it doesn't need deltas. + val resultMap = new MapType(lastAvailableMap.get) + for (deltaVersion <- lastAvailableVersion + 1 to version) { +updateFromDeltaFile(deltaVersion, resultMap) } -} -// Load all the deltas from the version after the last available one up to the target version. -// The last available version is the one with a full snapshot, so it doesn't need deltas. -val resultMap = new MapType(lastAvailableMap.get) -for (deltaVersion <- lastAvailableVersion + 1 to version) { - updateFromDeltaFile(deltaVersion, resultMap) + synchronized { loadedMaps.put(version, resultMap) } + resultMap } -synchronized { loadedMaps.put(version, resultMap) } -resultMap +logWarning(s"Loading state for $version takes $elapsedMs ms.") --- End diff -- I just thought about making a pair between warning message above and this, but once we are guiding end users to turn on DEBUG level to see information regarding addition latencies, turning this to DEBUG would be also OK. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21506: [SPARK-24485][SS] Measure and log elapsed time fo...
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21506#discussion_r194293251 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -280,38 +278,49 @@ private[state] class HDFSBackedStateStoreProvider extends StateStoreProvider wit if (loadedCurrentVersionMap.isDefined) { return loadedCurrentVersionMap.get } -val snapshotCurrentVersionMap = readSnapshotFile(version) -if (snapshotCurrentVersionMap.isDefined) { - synchronized { loadedMaps.put(version, snapshotCurrentVersionMap.get) } - return snapshotCurrentVersionMap.get -} -// Find the most recent map before this version that we can. -// [SPARK-22305] This must be done iteratively to avoid stack overflow. -var lastAvailableVersion = version -var lastAvailableMap: Option[MapType] = None -while (lastAvailableMap.isEmpty) { - lastAvailableVersion -= 1 +logWarning(s"The state for version $version doesn't exist in loadedMaps. " + + "Reading snapshot file and delta files if needed..." + + "Note that this is normal for the first batch of starting query.") - if (lastAvailableVersion <= 0) { -// Use an empty map for versions 0 or less. -lastAvailableMap = Some(new MapType) - } else { -lastAvailableMap = - synchronized { loadedMaps.get(lastAvailableVersion) } -.orElse(readSnapshotFile(lastAvailableVersion)) +val (result, elapsedMs) = Utils.timeTakenMs { --- End diff -- Yup right. Most of the code change is just wrapping codes into timeTakenMs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21505: [SPARK-24457][SQL] Improving performance of strin...
Github user ssonker commented on a diff in the pull request: https://github.com/apache/spark/pull/21505#discussion_r194292883 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -111,6 +113,23 @@ object DateTimeUtils { computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone) } + private val threadLocalComputedCalendarsMap = +new ThreadLocal[mutable.Map[TimeZone, Calendar]] { --- End diff -- @kiszk I think @viirya meant having just one thread-local calendar instance. That should also work, isn't it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r194292645 --- Diff: dev/merge_spark_pr.py --- @@ -39,6 +39,9 @@ except ImportError: JIRA_IMPORTED = False +if sys.version_info[0] >= 3: +raw_input = input --- End diff -- Does this script run with Python 3+ now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r194292662 --- Diff: dev/create-release/releaseutils.py --- @@ -49,6 +49,9 @@ print("Install using 'sudo pip install unidecode'") sys.exit(-1) +if sys.version_info[0] >= 3: +raw_input = input --- End diff -- Does this script work in Python 3+ now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r194292586 --- Diff: python/pyspark/sql/conf.py --- @@ -59,7 +62,7 @@ def unset(self, key): def _checkType(self, obj, identifier): """Assert that an object is of type str.""" -if not isinstance(obj, str) and not isinstance(obj, unicode): +if not isinstance(obj, basestring): --- End diff -- This is fine since we rely on short-circuiting. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r194292548 --- Diff: python/pyspark/streaming/dstream.py --- @@ -23,6 +23,8 @@ if sys.version < "3": from itertools import imap as map, ifilter as filter +else: +long = int --- End diff -- Can you add a test for it? Seems only used once and shouldn't be difficult to add a test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194292067 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): --- End diff -- This PR also changed `__repr__`. Thus, we need to update the PR title and description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21501 **[Test build #91648 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91648/testReport)** for PR 21501 at commit [`b4249c3`](https://github.com/apache/spark/commit/b4249c342a92dc840a1f1d5290c24a5fe165417d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21494: [WIP][SPARK-24375][Prototype] Support barrier sch...
Github user galv commented on a diff in the pull request: https://github.com/apache/spark/pull/21494#discussion_r193953345 --- Diff: core/src/main/scala/org/apache/spark/util/RpcUtils.scala --- @@ -44,7 +44,7 @@ private[spark] object RpcUtils { /** Returns the default Spark timeout to use for RPC ask operations. */ def askRpcTimeout(conf: SparkConf): RpcTimeout = { -RpcTimeout(conf, Seq("spark.rpc.askTimeout", "spark.network.timeout"), "120s") +RpcTimeout(conf, Seq("spark.rpc.askTimeout", "spark.network.timeout"), "900s") --- End diff -- Why hard-code this change? Couldn't you have set this at runtime if you needed it increased? I'm concerned about it breaking backwards compatibility with jobs that for whatever reason depend on the 120 second timeout. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21494: [WIP][SPARK-24375][Prototype] Support barrier sch...
Github user galv commented on a diff in the pull request: https://github.com/apache/spark/pull/21494#discussion_r193953432 --- Diff: core/src/main/scala/org/apache/spark/barrier/BarrierRDD.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.barrier + +import scala.reflect.ClassTag + +import org.apache.spark.{Partition, TaskContext} +import org.apache.spark.rdd.RDD + + +/** + * An RDD that supports running MPI programme. --- End diff -- `programme` -> `program` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194287915 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" --- End diff -- In the ongoing release, a nice-to-have refactoring is to move all the Core Confs into a single file just like what we did in Spark SQL Conf. Default values, boundary checking, types and descriptions. Thus, in PySpark, it would be better to do it starting from now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21495: [SPARK-24418][Build] Upgrade Scala to 2.11.12 and...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21495#discussion_r194287473 --- Diff: repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoopInterpreter.scala --- @@ -21,8 +21,22 @@ import scala.collection.mutable import scala.tools.nsc.Settings import scala.tools.nsc.interpreter._ -class SparkILoopInterpreter(settings: Settings, out: JPrintWriter) extends IMain(settings, out) { - self => +class SparkILoopInterpreter(settings: Settings, out: JPrintWriter, initializeSpark: () => Unit) +extends IMain(settings, out) { self => --- End diff -- IIRC, four spaces is OK. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20838 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91644/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20838 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20838 **[Test build #91644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91644/testReport)** for PR 20838 at commit [`fd4d922`](https://github.com/apache/spark/commit/fd4d9225a23bac79e895f5bd223001b8ccb6ba15). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21520: [SPARK-24505][SQL] Forbidding string interpolation in Co...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21520 > 1. We are seeing many inline prefix with a few typical patterns. > Can we introduce new APIs to avoid repetations of adding inline, for example JavaCode.className(Class[_]): JavaCode for the first call. @kiszk I initially took a similar approach but found soon that I'd create too many APIs. I'm not pretty sure if that is what we want to have distinguish them in API level because they are all actually a simple piece of inline string in code, so I turned to a `inline` to treat them as same. > 2. We are seeing many JavaCode.global() or JavaCode.variable() when we create a new variable. Would it be possible to make them simpler? Yes, I noticed that too. I was planning to change existing API such as `ctx.freshName`. But I leave it as it and set the first goal to pass all tests after forbidding string interpolation. Since the tests are passed now, I think we can incrementally make the changes more simpler and clear. I've proposed to do this part in some smaller PRs (ref: https://github.com/apache/spark/pull/21520#issuecomment-396111725). WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21520: [SPARK-24505][SQL] Forbidding string interpolation in Co...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21520 @kiszk @mgaido91 Thanks for your comment! > What do you think about starting doing the needed changes in smaller PRs which focus only on specific part and forbidding the string interpolation after those have made the needed changes smaller? By disallowing string interpolation in code blocks, any strings passed into a code block won't pass the compilation. It is also more guaranteed that we don't miss any strings. It is why this change is quite big and not in many smaller pieces. Most important is, I need to have all the changes together to see if we can pass all the tests once we completely forbid string interpolation. But it doesn't mean we need to review and merge this big change. It is still possible to break this into smaller PRs. It may work like this: 1. Split a part of change into a smaller PR, review it and finally merge it. 2. Incorporate the merged change back to this PR. Make test passed. Go back to step 1. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21067: [SPARK-23980][K8S] Resilient Spark driver on Kubernetes
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21067 any update? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21202: [SPARK-24129] [K8S] Add option to pass --build-arg's to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21202 **[Test build #91647 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91647/testReport)** for PR 21202 at commit [`3e410cd`](https://github.com/apache/spark/commit/3e410cdc9cf09996a3962727107125fc950d034e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21202: [SPARK-24129] [K8S] Add option to pass --build-arg's to ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21202 @devaraj-kavali could you rebase this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21202: [SPARK-24129] [K8S] Add option to pass --build-arg's to ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21202 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21486: [SPARK-24387][Core] Heartbeat-timeout executor is added ...
Github user lirui-apache commented on the issue: https://github.com/apache/spark/pull/21486 cc @vanzin @andrewor14 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19755: [SPARK-22524][SQL] Subquery shows reused on UI SQ...
Github user gczsjdy closed the pull request at: https://github.com/apache/spark/pull/19755 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21504: [SPARK-24479][SS] Added config for registering streaming...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21504 **[Test build #91645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91645/testReport)** for PR 21504 at commit [`421e16b`](https://github.com/apache/spark/commit/421e16b20f63f8df7f279bf2dcea76a060a85ad3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21497 **[Test build #91646 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91646/testReport)** for PR 21497 at commit [`d069dd0`](https://github.com/apache/spark/commit/d069dd009bac833ac5f1a61bd9f911d1e021e15c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21497 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21504: [SPARK-24479][SS] Added config for registering streaming...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21504 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21467 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21467 Merged to master. @e-dorigatti, it has some conflicts in branch-2.3 too. Mind if I ask to open a backporting PR again to reduce the difference between master and branch-2.3? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21495: [SPARK-24418][Build] Upgrade Scala to 2.11.12 and 2.12.6
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21495 Having issues tested with latest patch: ``` Exception in thread "main" java.lang.NoSuchMethodError: jline.console.completer.CandidateListCompletionHandler.setPrintSpaceAfterFullCompletion(Z)V at scala.tools.nsc.interpreter.jline.JLineConsoleReader.initCompletion(JLineReader.scala:139) at scala.tools.nsc.interpreter.jline.InteractiveReader.postInit(JLineReader.scala:54) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1$$anonfun$25.apply(ILoop.scala:899) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1$$anonfun$25.apply(ILoop.scala:897) at scala.tools.nsc.interpreter.SplashReader.postInit(InteractiveReader.scala:130) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1$$anonfun$scala$tools$nsc$interpreter$ILoop$$anonfun$$loopPostInit$1$1.apply$mcV$sp(ILoop.scala:926) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1$$anonfun$scala$tools$nsc$interpreter$ILoop$$anonfun$$loopPostInit$1$1.apply(ILoop.scala:908) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1$$anonfun$scala$tools$nsc$interpreter$ILoop$$anonfun$$loopPostInit$1$1.apply(ILoop.scala:908) at scala.tools.nsc.interpreter.ILoop$$anonfun$mumly$1.apply(ILoop.scala:189) at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:221) at scala.tools.nsc.interpreter.ILoop.mumly(ILoop.scala:186) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1$$anonfun$startup$1$1.apply(ILoop.scala:979) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:990) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:891) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:891) at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97) at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:891) at org.apache.spark.repl.Main$.doMain(Main.scala:76) at org.apache.spark.repl.Main$.main(Main.scala:56) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:837) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:194) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:912) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:923) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Exception in thread "Thread-15" java.lang.InterruptedException at java.util.concurrent.SynchronousQueue.put(SynchronousQueue.java:879) at scala.tools.nsc.interpreter.SplashLoop.run(InteractiveReader.scala:77) at java.lang.Thread.run(Thread.java:745) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21481 Thank you for your comment. I will create another PR for integrating findBugs/SpotBugs into maven. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user edwinalu commented on the issue: https://github.com/apache/spark/pull/21221 @squito , I'm modifying ExecutorMetrics to take in the metrics array -- this will be easier for tests where we pass in set values, and seems fine for the actual code. It will check that the length of the passed in array is the same as MetricGetter.values.length. Let me know if you have any concerns. @felixcheung , I'll finish the current changes, then rebase. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 It's at least not trivial as much as Scaia side's. I am okay but please make sure what case we will allow by this configuration. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194278100 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" --- End diff -- Probably, we should access to SQLConf object. 1. Agree with not hardcoding it in general but 2. IMHO I want to avoid Py4J JVM accesses in the test because the test can likely be more flaky up to my knowledge, on the other hand (unlike Scala or Java side). Maybe we should try to take a look about this hardcoding if we see more occurrences next time --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194277542 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and the REPL you are using supports eager evaluation, --- End diff -- Just a question. When the REPL does not support eager evaluation, could we do anything better instead of silently ignoring the user inputs? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21370 @xuanyuanking Thanks for your contributions! Test coverage is the most critical when we refactor the existing code and add new features. Hopefully, when you submit new PRs in the future, could you also improve this part? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194277082 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -3209,6 +3222,19 @@ class Dataset[T] private[sql]( } } + private[sql] def getRowsToPython( --- End diff -- In DataFrameSuite, we have multiple test cases for `showString` instead of `getRows `, which is introduced in this PR. We also need the unit test cases for `getRowsToPython`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194276795 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + --- End diff -- These confs are not part of `spark.sql("SET -v").show(numRows = 200, truncate = false)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194276735 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" --- End diff -- Is that possible we can avoid hard-coding these conf key values? cc @ueshin @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194276557 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + --- End diff -- All the SQL configurations should follow what we did in the section of `Spark SQL` https://spark.apache.org/docs/latest/configuration.html#spark-sql. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194276329 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = max(self._max_num_rows, 0) +vertical = False --- End diff -- Any discussion about this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194276298 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -3209,6 +3222,19 @@ class Dataset[T] private[sql]( } } + private[sql] def getRowsToPython( + _numRows: Int, + truncate: Int, + vertical: Boolean): Array[Any] = { +EvaluatePython.registerPicklers() +val numRows = _numRows.max(0).min(Int.MaxValue - 1) --- End diff -- This should be also part of the conf description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194276179 --- Diff: python/pyspark/sql/tests.py --- @@ -3074,6 +3074,36 @@ def test_checking_csv_header(self): finally: shutil.rmtree(path) +def test_repr_html(self): --- End diff -- This function only covers the most basic positive case. We need also add more test cases. For example, the results when `spark.sql.repl.eagerEval.enabled` is set to `false`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21370 @xuanyuanking @HyukjinKwon Sorry for the delay. Super busy in the week of Spark summit. Will carefully review this PR today or tomorrow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194275282 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and the REPL you are using supports eager evaluation, +Dataset will be ran automatically. The HTML table which generated by _repl_html_ +called by notebooks like Jupyter will feedback the queries user have defined. For plain Python +REPL, the output will be shown like dataframe.show() +(see https://issues.apache.org/jira/browse/SPARK-24215;>SPARK-24215 for more details). + + + + spark.sql.repl.eagerEval.maxNumRows + 20 + +Default number of rows in eager evaluation output HTML table generated by _repr_html_ or plain text, +this only take effect when spark.sql.repl.eagerEval.enabled is set to true. --- End diff -- take -> takes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194275288 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and the REPL you are using supports eager evaluation, +Dataset will be ran automatically. The HTML table which generated by _repl_html_ +called by notebooks like Jupyter will feedback the queries user have defined. For plain Python +REPL, the output will be shown like dataframe.show() +(see https://issues.apache.org/jira/browse/SPARK-24215;>SPARK-24215 for more details). + + + + spark.sql.repl.eagerEval.maxNumRows + 20 + +Default number of rows in eager evaluation output HTML table generated by _repr_html_ or plain text, +this only take effect when spark.sql.repl.eagerEval.enabled is set to true. + + + + spark.sql.repl.eagerEval.truncate + 20 + +Default number of truncate in eager evaluation output HTML table generated by _repr_html_ or +plain text, this only take effect when spark.sql.repl.eagerEval.enabled set to true. --- End diff -- take -> takes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21508: [SPARK-24488] [SQL] Fix issue when generator is a...
Github user bkrieger commented on a diff in the pull request: https://github.com/apache/spark/pull/21508#discussion_r194274619 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1568,11 +1568,32 @@ class Analyzer( expr.find(_.isInstanceOf[Generator]).isDefined } -private def hasNestedGenerator(expr: NamedExpression): Boolean = expr match { - case UnresolvedAlias(_: Generator, _) => false - case Alias(_: Generator, _) => false - case MultiAlias(_: Generator, _) => false - case other => hasGenerator(other) +private def hasNestedGenerator(expr: NamedExpression): Boolean = { + trimNonTopLevelAliases(expr) match { +case UnresolvedAlias(_: Generator, _) => false +case Alias(_: Generator, _) => false +case MultiAlias(_: Generator, _) => false +case other => hasGenerator(other) + } +} + +def trimNonTopLevelAliases(e: Expression): Expression = e match { + case a: UnresolvedAlias => --- End diff -- In my use case, no. But I wasn't sure if another use case would care. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21508: [SPARK-24488] [SQL] Fix issue when generator is a...
Github user bkrieger commented on a diff in the pull request: https://github.com/apache/spark/pull/21508#discussion_r194274604 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1568,11 +1568,32 @@ class Analyzer( expr.find(_.isInstanceOf[Generator]).isDefined } -private def hasNestedGenerator(expr: NamedExpression): Boolean = expr match { - case UnresolvedAlias(_: Generator, _) => false - case Alias(_: Generator, _) => false - case MultiAlias(_: Generator, _) => false - case other => hasGenerator(other) +private def hasNestedGenerator(expr: NamedExpression): Boolean = { + trimNonTopLevelAliases(expr) match { +case UnresolvedAlias(_: Generator, _) => false +case Alias(_: Generator, _) => false +case MultiAlias(_: Generator, _) => false +case other => hasGenerator(other) + } +} + +def trimNonTopLevelAliases(e: Expression): Expression = e match { --- End diff -- Sure- I didn't want to break any existing functionality, but I can do that instead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21452: [MINOR][CORE] Log committer class used by HadoopMapRedCo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21452 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21452: [MINOR][CORE] Log committer class used by HadoopMapRedCo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21452 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91640/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21452: [MINOR][CORE] Log committer class used by HadoopMapRedCo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21452 **[Test build #91640 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91640/testReport)** for PR 21452 at commit [`9881d9c`](https://github.com/apache/spark/commit/9881d9c6a2b1d56e69bb06ee27fd8706f6e0fe43). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `logInfo(s\"Using output committer class $` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21524: [SPARK-24212][ML][doc] Add the example and user guide fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21524 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21524: [SPARK-24212][ML][doc] Add the example and user guide fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21524 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21524: [SPARK-24212][ML][doc] Add the example and user guide fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21524 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21524: [SPARK-24212][ML][doc] Add the example and user g...
GitHub user tengpeng opened a pull request: https://github.com/apache/spark/pull/21524 [SPARK-24212][ML][doc] Add the example and user guide for ML PrefixSpan ## What changes were proposed in this pull request? There are no example and user guide for ML PrefixSpan (not MLlib PrefixSpan). This PR adds an example and a user guide. ## How was this patch tested? Generated the local web page. See the screenshot. https://user-images.githubusercontent.com/2724786/41207516-3d5c137e-6cdd-11e8-8e8f-f713231cc4fd.png;> You can merge this pull request into a Git repository by running: $ git pull https://github.com/tengpeng/spark Spark-24212 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21524.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21524 commit 333f97e48ffab3354bf2627959c40ac7a394a979 Author: Teng Peng Date: 2018-06-10T23:31:56Z Add the example and user guide for ML PrefixSpan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21438 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21438 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91641/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21438 **[Test build #91641 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91641/testReport)** for PR 21438 at commit [`eb87d2d`](https://github.com/apache/spark/commit/eb87d2d595374f3325a91ac53f0c11bff2b978e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21508: [SPARK-24488] [SQL] Fix issue when generator is a...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21508#discussion_r194273874 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1568,11 +1568,32 @@ class Analyzer( expr.find(_.isInstanceOf[Generator]).isDefined } -private def hasNestedGenerator(expr: NamedExpression): Boolean = expr match { - case UnresolvedAlias(_: Generator, _) => false - case Alias(_: Generator, _) => false - case MultiAlias(_: Generator, _) => false - case other => hasGenerator(other) +private def hasNestedGenerator(expr: NamedExpression): Boolean = { + trimNonTopLevelAliases(expr) match { +case UnresolvedAlias(_: Generator, _) => false +case Alias(_: Generator, _) => false +case MultiAlias(_: Generator, _) => false +case other => hasGenerator(other) + } +} + +def trimNonTopLevelAliases(e: Expression): Expression = e match { + case a: UnresolvedAlias => --- End diff -- Do we need to handle `UnresolvedAlias`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21508: [SPARK-24488] [SQL] Fix issue when generator is a...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21508#discussion_r194273780 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1568,11 +1568,32 @@ class Analyzer( expr.find(_.isInstanceOf[Generator]).isDefined } -private def hasNestedGenerator(expr: NamedExpression): Boolean = expr match { - case UnresolvedAlias(_: Generator, _) => false - case Alias(_: Generator, _) => false - case MultiAlias(_: Generator, _) => false - case other => hasGenerator(other) +private def hasNestedGenerator(expr: NamedExpression): Boolean = { + trimNonTopLevelAliases(expr) match { +case UnresolvedAlias(_: Generator, _) => false +case Alias(_: Generator, _) => false +case MultiAlias(_: Generator, _) => false +case other => hasGenerator(other) + } +} + +def trimNonTopLevelAliases(e: Expression): Expression = e match { --- End diff -- Instead of duplicating the function here, could we just fixing `CleanupAliases.trimNonTopLevelAliases` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21045: [SPARK-23931][SQL] Adds zip function to sparksql
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21045 **[Test build #91643 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91643/testReport)** for PR 21045 at commit [`d8f3dea`](https://github.com/apache/spark/commit/d8f3dea8b227a4ee44dedb6b8199c8a17f6bfdd4). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ArraysZip(children: Seq[Expression]) extends Expression with ExpectsInputTypes ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21045: [SPARK-23931][SQL] Adds zip function to sparksql
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21045 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91643/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21045: [SPARK-23931][SQL] Adds zip function to sparksql
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21045 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20838 **[Test build #91644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91644/testReport)** for PR 20838 at commit [`fd4d922`](https://github.com/apache/spark/commit/fd4d9225a23bac79e895f5bd223001b8ccb6ba15). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21045: [SPARK-23931][SQL] Adds zip function to sparksql
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21045 **[Test build #91643 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91643/testReport)** for PR 21045 at commit [`d8f3dea`](https://github.com/apache/spark/commit/d8f3dea8b227a4ee44dedb6b8199c8a17f6bfdd4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16006 #19431 was merged, thanks for your work. This PR should probably be closed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21481 Let's merge this as-is and do the build improvements in a separate PR. That's important because we may want to backport the overflow fix to maintenance branches and may want to do so independent of the build changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21505: [SPARK-24457][SQL] Improving performance of strin...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21505#discussion_r194268734 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -111,6 +113,23 @@ object DateTimeUtils { computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone) } + private val threadLocalComputedCalendarsMap = +new ThreadLocal[mutable.Map[TimeZone, Calendar]] { --- End diff -- Usually, only the default time zone is used. To execute `Cast` regarding date is called with a timezone may use another timezone. For the correctness, I think that it is necessary to support multiple timezones. To enable caching for default time zone and to create an instance for other time zones would also work correctly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user IgorBerman commented on the issue: https://github.com/apache/spark/pull/20640 @felixcheung sorry I missed something? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21221 **[Test build #91642 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91642/testReport)** for PR 21221 at commit [`7879e66`](https://github.com/apache/spark/commit/7879e66eed22cfd4dff2367c0ee3138369243711). * This patch **fails to build**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `sealed trait MetricGetter ` * `abstract class MemoryManagerMetricGetter(f: MemoryManager => Long) extends MetricGetter ` * `abstract class MBeanMetricGetter(mBeanName: String) extends MetricGetter ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91642/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21221 **[Test build #91642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91642/testReport)** for PR 21221 at commit [`7879e66`](https://github.com/apache/spark/commit/7879e66eed22cfd4dff2367c0ee3138369243711). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21438 **[Test build #91641 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91641/testReport)** for PR 21438 at commit [`eb87d2d`](https://github.com/apache/spark/commit/eb87d2d595374f3325a91ac53f0c11bff2b978e7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #91639 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91639/testReport)** for PR 20640 at commit [`a7ff8cc`](https://github.com/apache/spark/commit/a7ff8cccd1b7e5564880c40c503c169c6bed46b9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21452: [MINOR][CORE] Log committer class used by HadoopMapRedCo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21452 **[Test build #91640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91640/testReport)** for PR 21452 at commit [`9881d9c`](https://github.com/apache/spark/commit/9881d9c6a2b1d56e69bb06ee27fd8706f6e0fe43). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91639/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21221 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21221 probably need to be rebased --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21452: [MINOR][CORE] Log committer class used by HadoopMapRedCo...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21452 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21438 I think filtering off `metricIds` still make sense right? @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21438 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20838 any update? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20272 Is this aligned with the "in cluster client"? @foxish @mccheah --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21481 Since I found an plug-in for maven, I will also include a patch to add findBugs/SpotBugs into maven in this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20640 **[Test build #91639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91639/testReport)** for PR 20640 at commit [`a7ff8cc`](https://github.com/apache/spark/commit/a7ff8cccd1b7e5564880c40c503c169c6bed46b9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org