[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19691 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP P...
GitHub user DazhuangSu opened a pull request: https://github.com/apache/spark/pull/19691 [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITION should support comparators ## What changes were proposed in this pull request? This pr is inspired by @dongjoon-hyun. quote from https://github.com/apache/spark/pull/15704 : > **What changes were proposed in this pull request?** This PR aims to support comparators, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility. **Spark 1.6** `scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") res0: org.apache.spark.sql.DataFrame = [result: string]` `scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')") res1: org.apache.spark.sql.DataFrame = [result: string]` **Spark 2.0** `scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") res0: org.apache.spark.sql.DataFrame = []` `scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")` `org.apache.spark.sql.catalyst.parser.ParseException:` `mismatched input '<' expecting {')', ','}(line 1, pos 42)` After this PR, it's supported. **How was this patch tested?** Pass the Jenkins test with a newly added testcase. https://github.com/apache/spark/pull/16036 points out that if we use int literal in DROP PARTITION will fail after patching https://github.com/apache/spark/pull/15704. The reason of this failing in https://github.com/apache/spark/pull/15704 is that AlterTableDropPartitionCommand tells BinayComparison and EqualTo with following code: `private def isRangeComparison(expr: Expression): Boolean = {⨠` `expr.find(e => e.isInstanceOf[BinaryComparison] && !e.isInstanceOf[EqualTo]).isDefinedâ¨}` This PR resolve this problem by telling a drop condition when parsing sqls. ## How was this patch tested? New testcase introduced from https://github.com/apache/spark/pull/15704 You can merge this pull request into a Git repository by running: $ git pull https://github.com/DazhuangSu/spark SPARK-17732 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19691.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19691 commit 20f658ad8e14a94dd23bff6a8d795124d1db24e9 Author: Dylan Su Date: 2017-11-08T03:44:28Z [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITION should support comparators --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19690: [SPARK-22467]Added a switch to support whether `stdout_s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19690 **[Test build #83588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83588/testReport)** for PR 19690 at commit [`7b67148`](https://github.com/apache/spark/commit/7b671485e46a7e7c4fbce57b7f9e8fa66adcd82a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19690: [SPARK-22467]Added a switch to support whether `s...
GitHub user 10110346 opened a pull request: https://github.com/apache/spark/pull/19690 [SPARK-22467]Added a switch to support whether `stdout_stream` and `stderr_stream` output to disk ## What changes were proposed in this pull request? We should add a switch to control the `stdout_stream` and `stdout_stream` output to disk. In my environment,due to disk I/O blocking, the `stdout_stream` output is very slow, so it can not be timely cleaningï¼and this leads the executor process to be blocked. ## How was this patch tested? Added a unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/10110346/spark stdout_err Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19690.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19690 commit 7b671485e46a7e7c4fbce57b7f9e8fa66adcd82a Author: liuxian Date: 2017-11-07T09:16:48Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13206: [SPARK-15420] [SQL] Add repartition and sort to prepare ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13206 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13206: [SPARK-15420] [SQL] Add repartition and sort to prepare ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13206 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83583/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13206: [SPARK-15420] [SQL] Add repartition and sort to prepare ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13206 **[Test build #83583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83583/consoleFull)** for PR 13206 at commit [`a64be8a`](https://github.com/apache/spark/commit/a64be8a91ddadcd7acbbd08956f214b3c40f0dca). * This patch **fails PySpark unit tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `case class DistributeAndSortOutputData(conf: CatalystConf) extends Rule[LogicalPlan] ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19662 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83580/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19662 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19662 **[Test build #83580 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83580/testReport)** for PR 19662 at commit [`d2ac83e`](https://github.com/apache/spark/commit/d2ac83e5b1c74abd422e436752f1cf91127e388a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19683 **[Test build #83587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83587/testReport)** for PR 19683 at commit [`b8b5960`](https://github.com/apache/spark/commit/b8b5960f230b015896918a5465c919550af980ac). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior of time...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19607 **[Test build #83586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83586/testReport)** for PR 19607 at commit [`1e0f217`](https://github.com/apache/spark/commit/1e0f21715f5ad053b5a5677a129677d498946ea3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19607#discussion_r149582142 --- Diff: python/pyspark/sql/types.py --- @@ -1629,37 +1629,82 @@ def to_arrow_type(dt): return arrow_type -def _check_dataframe_localize_timestamps(pdf): +def _check_dataframe_localize_timestamps(pdf, timezone): """ -Convert timezone aware timestamps to timezone-naive in local time +Convert timezone aware timestamps to timezone-naive in the specified timezone or local timezone :param pdf: pandas.DataFrame -:return pandas.DataFrame where any timezone aware columns have be converted to tz-naive +:param timezone: the timezone to convert. if None then use local timezone +:return pandas.DataFrame where any timezone aware columns have been converted to tz-naive """ from pandas.api.types import is_datetime64tz_dtype +tz = timezone or 'tzlocal()' for column, series in pdf.iteritems(): # TODO: handle nested timestamps, such as ArrayType(TimestampType())? if is_datetime64tz_dtype(series.dtype): -pdf[column] = series.dt.tz_convert('tzlocal()').dt.tz_localize(None) +pdf[column] = series.dt.tz_convert(tz).dt.tz_localize(None) return pdf -def _check_series_convert_timestamps_internal(s): +def _check_series_convert_timestamps_internal(s, timezone): """ -Convert a tz-naive timestamp in local tz to UTC normalized for Spark internal storage +Convert a tz-naive timestamp in the specified timezone or local timezone to UTC normalized for +Spark internal storage + :param s: a pandas.Series +:param timezone: the timezone to convert. if None then use local timezone :return pandas.Series where if it is a timestamp, has been UTC normalized without a time zone """ from pandas.api.types import is_datetime64_dtype, is_datetime64tz_dtype # TODO: handle nested timestamps, such as ArrayType(TimestampType())? if is_datetime64_dtype(s.dtype): -return s.dt.tz_localize('tzlocal()').dt.tz_convert('UTC') +tz = timezone or 'tzlocal()' +return s.dt.tz_localize(tz).dt.tz_convert('UTC') elif is_datetime64tz_dtype(s.dtype): return s.dt.tz_convert('UTC') else: return s +def _check_series_convert_timestamps_localize(s, timezone): +""" +Convert timestamp to timezone-naive in the specified timezone or local timezone + +:param s: a pandas.Series +:param timezone: the timezone to convert. if None then use local timezone +:return pandas.Series where if it is a timestamp, has been converted to tz-naive +""" +import pandas as pd +try: +from pandas.api.types import is_datetime64tz_dtype, is_datetime64_dtype +tz = timezone or 'tzlocal()' +# TODO: handle nested timestamps, such as ArrayType(TimestampType())? +if is_datetime64tz_dtype(s.dtype): +return s.dt.tz_convert(tz).dt.tz_localize(None) +elif is_datetime64_dtype(s.dtype) and timezone is not None: +# `s.dt.tz_localize('tzlocal()')` doesn't work properly when including NaT. +return s.apply(lambda ts: ts.tz_localize('tzlocal()').tz_convert(tz).tz_localize(None) + if ts is not pd.NaT else pd.NaT) +else: +return s +except ImportError: --- End diff -- We will be able to remove this block if we decided to support only Pandas >=0.19.2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19459 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83579/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19459 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19459 **[Test build #83579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83579/testReport)** for PR 19459 at commit [`99ce1e4`](https://github.com/apache/spark/commit/99ce1e44f57c411af95b1c9d9c95f35f2c1652e1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19657 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19657 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83582/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior of time...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19607 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior of time...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19607 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83578/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19657 **[Test build #83582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83582/testReport)** for PR 19657 at commit [`18e238a`](https://github.com/apache/spark/commit/18e238a62d53de5a73283a741c1a9bb8230f4484). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior of time...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19607 **[Test build #83578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83578/testReport)** for PR 19607 at commit [`4adb073`](https://github.com/apache/spark/commit/4adb073f8d1454fbea0742a16b6d7662e063b37a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19681 **[Test build #83585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83585/testReport)** for PR 19681 at commit [`ecf293b`](https://github.com/apache/spark/commit/ecf293b31fa1b5250f484d6b2f09373e7057bbc3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19662 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19662 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83577/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19662 **[Test build #83577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83577/testReport)** for PR 19662 at commit [`dd672ac`](https://github.com/apache/spark/commit/dd672ac815038f8dfd89fecb1f5b3d4668158752). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19681: [SPARK-20652][sql] Store SQL UI data in the new a...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19681#discussion_r149579972 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala --- @@ -0,0 +1,353 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.ui + +import java.util.Date +import java.util.concurrent.ConcurrentHashMap + +import scala.collection.JavaConverters._ +import scala.collection.mutable.HashMap + +import org.apache.spark.{JobExecutionStatus, SparkConf} +import org.apache.spark.internal.Logging +import org.apache.spark.scheduler._ +import org.apache.spark.sql.execution.SQLExecution +import org.apache.spark.sql.execution.metric._ +import org.apache.spark.sql.internal.StaticSQLConf._ +import org.apache.spark.status.LiveEntity +import org.apache.spark.status.config._ +import org.apache.spark.ui.SparkUI +import org.apache.spark.util.kvstore.KVStore + +private[sql] class SQLAppStatusListener( +conf: SparkConf, +kvstore: KVStore, +live: Boolean, +ui: Option[SparkUI] = None) + extends SparkListener with Logging { + + // How often to flush intermediate stage of a live execution to the store. When replaying logs, + // never flush (only do the very last write). + private val liveUpdatePeriodNs = if (live) conf.get(LIVE_ENTITY_UPDATE_PERIOD) else -1L + + private val liveExecutions = new HashMap[Long, LiveExecutionData]() + private val stageMetrics = new HashMap[Int, LiveStageMetrics]() + + private var uiInitialized = false + + override def onJobStart(event: SparkListenerJobStart): Unit = { +val executionIdString = event.properties.getProperty(SQLExecution.EXECUTION_ID_KEY) +if (executionIdString == null) { + // This is not a job created by SQL + return +} + +val executionId = executionIdString.toLong +val jobId = event.jobId +val exec = getOrCreateExecution(executionId) + +// Record the accumulator IDs for the stages of this job, so that the code that keeps +// track of the metrics knows which accumulators to look at. +val accumIds = exec.metrics.map(_.accumulatorId).sorted.toList +event.stageIds.foreach { id => + stageMetrics.put(id, new LiveStageMetrics(id, 0, accumIds.toArray, new ConcurrentHashMap())) +} + +exec.jobs = exec.jobs + (jobId -> JobExecutionStatus.RUNNING) +exec.stages = event.stageIds +update(exec) + } + + override def onStageSubmitted(event: SparkListenerStageSubmitted): Unit = { +if (!isSQLStage(event.stageInfo.stageId)) { + return +} + +// Reset the metrics tracking object for the new attempt. +stageMetrics.get(event.stageInfo.stageId).foreach { metrics => + metrics.taskMetrics.clear() + metrics.attemptId = event.stageInfo.attemptId +} + } + + override def onJobEnd(event: SparkListenerJobEnd): Unit = { +liveExecutions.values.foreach { exec => + if (exec.jobs.contains(event.jobId)) { +val result = event.jobResult match { + case JobSucceeded => JobExecutionStatus.SUCCEEDED + case _ => JobExecutionStatus.FAILED +} +exec.jobs = exec.jobs + (event.jobId -> result) +exec.endEvents += 1 +update(exec) + } +} + } + + override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { +event.accumUpdates.foreach { case (taskId, stageId, attemptId, accumUpdates) => + updateStageMetrics(stageId, attemptId, taskId, accumUpdates, false) +} + } + + override def onTaskEnd(event: SparkListenerTaskEnd): Unit = { +if (!isSQLStage(event.stageId)) { + return +} + +val info = event.taskInfo +// SPARK-20342. If processing events from a live ap
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19689 **[Test build #83584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83584/testReport)** for PR 19689 at commit [`f0c6399`](https://github.com/apache/spark/commit/f0c639909d7b1638cdf2de5c3684d7de1c375436). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19681: [SPARK-20652][sql] Store SQL UI data in the new a...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/19681#discussion_r149578181 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala --- @@ -40,7 +40,7 @@ private[sql] class SQLAppStatusListener( ui: Option[SparkUI] = None) extends SparkListener with Logging { - // How often to flush intermediate statge of a live execution to the store. When replaying logs, + // How often to flush intermediate stage of a live execution to the store. When replaying logs, --- End diff -- err, was this supposed to be "state"? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19681: [SPARK-20652][sql] Store SQL UI data in the new a...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/19681#discussion_r149578074 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala --- @@ -0,0 +1,353 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.ui + +import java.util.Date +import java.util.concurrent.ConcurrentHashMap + +import scala.collection.JavaConverters._ +import scala.collection.mutable.HashMap + +import org.apache.spark.{JobExecutionStatus, SparkConf} +import org.apache.spark.internal.Logging +import org.apache.spark.scheduler._ +import org.apache.spark.sql.execution.SQLExecution +import org.apache.spark.sql.execution.metric._ +import org.apache.spark.sql.internal.StaticSQLConf._ +import org.apache.spark.status.LiveEntity +import org.apache.spark.status.config._ +import org.apache.spark.ui.SparkUI +import org.apache.spark.util.kvstore.KVStore + +private[sql] class SQLAppStatusListener( +conf: SparkConf, +kvstore: KVStore, +live: Boolean, +ui: Option[SparkUI] = None) + extends SparkListener with Logging { + + // How often to flush intermediate stage of a live execution to the store. When replaying logs, + // never flush (only do the very last write). + private val liveUpdatePeriodNs = if (live) conf.get(LIVE_ENTITY_UPDATE_PERIOD) else -1L + + private val liveExecutions = new HashMap[Long, LiveExecutionData]() + private val stageMetrics = new HashMap[Int, LiveStageMetrics]() + + private var uiInitialized = false + + override def onJobStart(event: SparkListenerJobStart): Unit = { +val executionIdString = event.properties.getProperty(SQLExecution.EXECUTION_ID_KEY) +if (executionIdString == null) { + // This is not a job created by SQL + return +} + +val executionId = executionIdString.toLong +val jobId = event.jobId +val exec = getOrCreateExecution(executionId) + +// Record the accumulator IDs for the stages of this job, so that the code that keeps +// track of the metrics knows which accumulators to look at. +val accumIds = exec.metrics.map(_.accumulatorId).sorted.toList +event.stageIds.foreach { id => + stageMetrics.put(id, new LiveStageMetrics(id, 0, accumIds.toArray, new ConcurrentHashMap())) +} + +exec.jobs = exec.jobs + (jobId -> JobExecutionStatus.RUNNING) +exec.stages = event.stageIds +update(exec) + } + + override def onStageSubmitted(event: SparkListenerStageSubmitted): Unit = { +if (!isSQLStage(event.stageInfo.stageId)) { + return +} + +// Reset the metrics tracking object for the new attempt. +stageMetrics.get(event.stageInfo.stageId).foreach { metrics => + metrics.taskMetrics.clear() + metrics.attemptId = event.stageInfo.attemptId +} + } + + override def onJobEnd(event: SparkListenerJobEnd): Unit = { +liveExecutions.values.foreach { exec => + if (exec.jobs.contains(event.jobId)) { +val result = event.jobResult match { + case JobSucceeded => JobExecutionStatus.SUCCEEDED + case _ => JobExecutionStatus.FAILED +} +exec.jobs = exec.jobs + (event.jobId -> result) +exec.endEvents += 1 +update(exec) + } +} + } + + override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { +event.accumUpdates.foreach { case (taskId, stageId, attemptId, accumUpdates) => + updateStageMetrics(stageId, attemptId, taskId, accumUpdates, false) +} + } + + override def onTaskEnd(event: SparkListenerTaskEnd): Unit = { +if (!isSQLStage(event.stageId)) { + return +} + +val info = event.taskInfo +// SPARK-20342. If processing events from a live ap
[GitHub] spark issue #13206: [SPARK-15420] [SQL] Add repartition and sort to prepare ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13206 **[Test build #83583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83583/consoleFull)** for PR 13206 at commit [`a64be8a`](https://github.com/apache/spark/commit/a64be8a91ddadcd7acbbd08956f214b3c40f0dca). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19678: [SPARK-20646][core] Port executors page to new UI...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19678 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...
Github user squito commented on the issue: https://github.com/apache/spark/pull/19678 merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19557: [SPARK-22281][SPARKR] Handle R method breaking si...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19557 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19557: [SPARK-22281][SPARKR] Handle R method breaking signature...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19557 merged to master/2.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19285 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check for vers...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19619 merged --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check f...
Github user felixcheung closed the pull request at: https://github.com/apache/spark/pull/19619 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19285 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83575/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check f...
Github user felixcheung closed the pull request at: https://github.com/apache/spark/pull/19620 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19657 **[Test build #83582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83582/testReport)** for PR 19657 at commit [`18e238a`](https://github.com/apache/spark/commit/18e238a62d53de5a73283a741c1a9bb8230f4484). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check for vers...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19620 merged --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19657 Yup, I just checked it too and was writing a comment .. The current change should pass :). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19285 **[Test build #83575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83575/testReport)** for PR 19285 at commit [`bc3ad4e`](https://github.com/apache/spark/commit/bc3ad4ea11e49b19ef4199642dbc4488f202d928). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19689 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83581/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19689 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19689 **[Test build #83581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83581/testReport)** for PR 19689 at commit [`ac539cd`](https://github.com/apache/spark/commit/ac539cd0e761193d9a665d8ccb19a8fba5dd504b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17436 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83573/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17436 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17436 **[Test build #83573 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83573/testReport)** for PR 17436 at commit [`9ce6fc0`](https://github.com/apache/spark/commit/9ce6fc0b0ad2c4c97236f0519db07b5a3600bb81). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19678 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19678 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83572/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19678 **[Test build #83572 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83572/testReport)** for PR 19678 at commit [`c7123d9`](https://github.com/apache/spark/commit/c7123d9c8d3934c482cd89ea820b2958f4dbbe0a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19648: [SPARK-14516][ML][FOLLOW-UP] Move ClusteringEvalu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19648 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19648: [SPARK-14516][ML][FOLLOW-UP] Move ClusteringEvaluatorSui...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/19648 Merged into master, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19689 **[Test build #83581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83581/testReport)** for PR 19689 at commit [`ac539cd`](https://github.com/apache/spark/commit/ac539cd0e761193d9a665d8ccb19a8fba5dd504b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19689 The screenshot for running `sql("select * from range(10)").foreach(a => Unit)` on spark-shell: https://user-images.githubusercontent.com/68855/32531135-1e60d544-c47d-11e7-88d6-627ef77d0b80.png";> --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19689: [SPARK-22462][SQL] Make rdd-based actions in Data...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/19689 [SPARK-22462][SQL] Make rdd-based actions in Dataset trackable in SQL UI ## What changes were proposed in this pull request? For the few Dataset actions such as `foreach`, currently no SQL metrics are visible in the SQL tab of SparkUI. It is because it binds wrongly to Dataset's `QueryExecution`. As the actions directly evaluate on the RDD which has individual `QueryExecution`, to show correct SQL metrics on UI, we should bind to RDD's `QueryExecution`. ## How was this patch tested? Manually test. Screenshot is attached in the PR. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-22462 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19689.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19689 commit ac539cd0e761193d9a665d8ccb19a8fba5dd504b Author: Liang-Chi Hsieh Date: 2017-11-07T10:54:14Z Make rdd-based actions trackable in UI. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19687: [SPARK-19644][SQL]Clean up Scala reflection garbage afte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19687 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83571/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19687: [SPARK-19644][SQL]Clean up Scala reflection garbage afte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19687 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19687: [SPARK-19644][SQL]Clean up Scala reflection garbage afte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19687 **[Test build #83571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83571/testReport)** for PR 19687 at commit [`c03811f`](https://github.com/apache/spark/commit/c03811ff006058987fa8d5fb9f7d097b9acc9ac5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19662 **[Test build #83580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83580/testReport)** for PR 19662 at commit [`d2ac83e`](https://github.com/apache/spark/commit/d2ac83e5b1c74abd422e436752f1cf91127e388a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19662#discussion_r149568133 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -126,4 +126,25 @@ class VectorAssemblerSuite .setOutputCol("myOutputCol") testDefaultReadWrite(t) } + + test("VectorAssembler's UDF should not apply on filtered data") { --- End diff -- Ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19662#discussion_r149567769 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -126,4 +126,25 @@ class VectorAssemblerSuite .setOutputCol("myOutputCol") testDefaultReadWrite(t) } + + test("VectorAssembler's UDF should not apply on filtered data") { --- End diff -- mark the [SPARK-22446] on the test name. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19666#discussion_r149567340 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -631,6 +614,42 @@ class RandomForestSuite extends SparkFunSuite with MLlibTestSparkContext { val expected = Map(0 -> 1.0 / 3.0, 2 -> 2.0 / 3.0) assert(mapToVec(map.toMap) ~== mapToVec(expected) relTol 0.01) } + + test("traverseUnorderedSplits") { + --- End diff -- So how to test all possible splits to make sure the generated splits are all correct ? If tree generated, only best split is remained. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19459 **[Test build #83579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83579/testReport)** for PR 19459 at commit [`99ce1e4`](https://github.com/apache/spark/commit/99ce1e44f57c411af95b1c9d9c95f35f2c1652e1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19459 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19664#discussion_r149565144 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -214,11 +215,13 @@ case class Invoke( override def eval(input: InternalRow): Any = throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + private lazy val encodedFunctionName = TermName(functionName).encodedName.toString --- End diff -- Since we use `Invoke` to access field(s) in object, this can be an issue. I didn't found `StaticInvoke` used similarly. So it should be fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19663 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83576/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19663 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19663 **[Test build #83576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83576/testReport)** for PR 19663 at commit [`f8c1f63`](https://github.com/apache/spark/commit/f8c1f63944c602a00802356f94788464320ffa3f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19664#discussion_r149564523 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -335,4 +338,17 @@ class ScalaReflectionSuite extends SparkFunSuite { assert(linkedHashMapDeserializer.dataType == ObjectType(classOf[LHMap[_, _]])) } + test("SPARK-22442: Generate correct field names for special characters") { +val serializer = serializerFor[SpecialCharAsFieldData](BoundReference( + 0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false)) +val deserializer = deserializerFor[SpecialCharAsFieldData] +assert(serializer.dataType(0).name == "field.1") +assert(serializer.dataType(1).name == "field 2") + +val argumentsFields = deserializer.asInstanceOf[NewInstance].arguments.flatMap { _.collect { + case UpCast(u: UnresolvedAttribute, _, _) => u.name +}} +assert(argumentsFields(0) == "`field.1`") --- End diff -- We need to deliberately wrap backticks around a field name such as `field.1` because of the dot character. Otherwise `UnresolvedAttribute` will parse it as two name parts `Seq("field", "1")` and then fail resolving later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19664#discussion_r149564330 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -214,11 +215,13 @@ case class Invoke( override def eval(input: InternalRow): Any = throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + private lazy val encodedFunctionName = TermName(functionName).encodedName.toString --- End diff -- Maybe, although I didn't have concrete case causing the issue for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19272#discussion_r149564294 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -213,6 +216,14 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( sc.conf.getOption("spark.mesos.driver.frameworkId").map(_ + suffix) ) +// check that the credentials are defined, even though it's likely that auth would have failed +// already if you've made it this far, then start the token renewer +if (hadoopDelegationTokens.isDefined) { --- End diff -- I may have spoke too soon, there might be a way.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19662 @WeichenXu123 I did a scan. Currently I only found `VectorAssembler`'s udf may have similar issue. Fixed and added test for it too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19662 **[Test build #83577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83577/testReport)** for PR 19662 at commit [`dd672ac`](https://github.com/apache/spark/commit/dd672ac815038f8dfd89fecb1f5b3d4668158752). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior of time...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19607 **[Test build #83578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83578/testReport)** for PR 19607 at commit [`4adb073`](https://github.com/apache/spark/commit/4adb073f8d1454fbea0742a16b6d7662e063b37a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19156 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19156 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83574/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19156 **[Test build #83574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83574/testReport)** for PR 19156 at commit [`480e80d`](https://github.com/apache/spark/commit/480e80dbb0392bebe96dc1620195a39b54f75740). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19663 **[Test build #83576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83576/testReport)** for PR 19663 at commit [`f8c1f63`](https://github.com/apache/spark/commit/f8c1f63944c602a00802356f94788464320ffa3f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR while c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19688 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hba...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/19663#discussion_r149561925 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -687,6 +687,20 @@ private[spark] class Client( private def createConfArchive(): File = { val hadoopConfFiles = new HashMap[String, File]() +// SPARK_CONF_DIR shows up in the classpath before HADOOP_CONF_DIR/YARN_CONF_DIR +val localConfDir = System.getProperty("SPARK_CONF_DIR", + System.getProperty("SPARK_HOME") + File.separator + "conf") +val dir = new File(localConfDir) +if (dir.isDirectory) { + val files = dir.listFiles(new FileFilter { +override def accept(pathname: File): Boolean = { + pathname.isFile && pathname.getName.endsWith("xml") +} + }) + files.foreach { f => hadoopConfFiles(f.getName) = f } +} + +// Ensure HADOOP_CONF_DIR/YARN_CONF_DIR not overriding existing files --- End diff -- ok, i'd remove it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19565 ok I agree this change. @jkbradley Can you take a look ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hba...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/19663#discussion_r149561877 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -687,6 +687,20 @@ private[spark] class Client( private def createConfArchive(): File = { val hadoopConfFiles = new HashMap[String, File]() +// SPARK_CONF_DIR shows up in the classpath before HADOOP_CONF_DIR/YARN_CONF_DIR +val localConfDir = System.getProperty("SPARK_CONF_DIR", --- End diff -- not exactly till now , plz check https://github.com/apache/spark/pull/19688 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hba...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/19663#discussion_r149561888 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -687,6 +687,20 @@ private[spark] class Client( private def createConfArchive(): File = { val hadoopConfFiles = new HashMap[String, File]() +// SPARK_CONF_DIR shows up in the classpath before HADOOP_CONF_DIR/YARN_CONF_DIR +val localConfDir = System.getProperty("SPARK_CONF_DIR", + System.getProperty("SPARK_HOME") + File.separator + "conf") +val dir = new File(localConfDir) +if (dir.isDirectory) { + val files = dir.listFiles(new FileFilter { +override def accept(pathname: File): Boolean = { + pathname.isFile && pathname.getName.endsWith("xml") --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR ...
GitHub user yaooqinn opened a pull request: https://github.com/apache/spark/pull/19688 [SPARK-22466][Spark Submit]export SPARK_CONF_DIR while conf is default ## What changes were proposed in this pull request? ### Before ``` Kent@KentsMacBookPro î° ~/Documents/spark-packages/spark-2.3.0-SNAPSHOT-bin-master î° bin/spark-shell --master local Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/11/08 10:28:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/11/08 10:28:45 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. Spark context Web UI available at http://169.254.168.63:4041 Spark context available as 'sc' (master = local, app id = local-1510108125770). Spark session available as 'spark'. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_65) Type in expressions to have them evaluated. Type :help for more information. scala> sys.env.get("SPARK_CONF_DIR") res0: Option[String] = None ``` ### After ``` scala> sys.env.get("SPARK_CONF_DIR") res0: Option[String] = Some(/Users/Kent/Documents/spark/conf) ``` ## How was this patch tested? @vanzin You can merge this pull request into a Git repository by running: $ git pull https://github.com/yaooqinn/spark SPARK-22466 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19688.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19688 commit 19ac61cd6d8b4cca295a1f0d2f2988ee3ac20d8c Author: Kent Yao Date: 2017-11-08T02:30:01Z export SPARK_CONF_DIR while conf is default --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19666#discussion_r149561550 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -741,17 +678,43 @@ private[spark] object RandomForest extends Logging { (splits(featureIndex)(bestFeatureSplitIndex), bestFeatureGainStats) } else if (binAggregates.metadata.isUnordered(featureIndex)) { // Unordered categorical feature - val leftChildOffset = binAggregates.getFeatureOffset(featureIndexIdx) - val (bestFeatureSplitIndex, bestFeatureGainStats) = -Range(0, numSplits).map { splitIndex => - val leftChildStats = binAggregates.getImpurityCalculator(leftChildOffset, splitIndex) - val rightChildStats = binAggregates.getParentImpurityCalculator() -.subtract(leftChildStats) + val numBins = binAggregates.metadata.numBins(featureIndex) + val featureOffset = binAggregates.getFeatureOffset(featureIndexIdx) + + val binStatsArray = Array.tabulate(numBins) { binIndex => +binAggregates.getImpurityCalculator(featureOffset, binIndex) + } + val parentStats = binAggregates.getParentImpurityCalculator() + + var bestGain = Double.NegativeInfinity + var bestSet: BitSet = null + var bestLeftChildStats: ImpurityCalculator = null + var bestRightChildStats: ImpurityCalculator = null + + traverseUnorderedSplits[ImpurityCalculator](numBins, null, +(stats, binIndex) => { + val binStats = binStatsArray(binIndex) + if (stats == null) { +binStats + } else { +stats.copy.add(binStats) + } +}, +(set, leftChildStats) => { + val rightChildStats = parentStats.copy.subtract(leftChildStats) gainAndImpurityStats = calculateImpurityStats(gainAndImpurityStats, leftChildStats, rightChildStats, binAggregates.metadata) - (splitIndex, gainAndImpurityStats) -}.maxBy(_._2.gain) - (splits(featureIndex)(bestFeatureSplitIndex), bestFeatureGainStats) + if (gainAndImpurityStats.gain > bestGain) { +bestGain = gainAndImpurityStats.gain +bestSet = set | new BitSet(numBins) // copy set --- End diff -- The class do not support `copy` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19666 Also cc @smurching Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r149560345 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite (Intercept) 6.3022157 0.00186003388 <2e-16 *** V2 4.6982442 0.00118053980 <2e-16 *** V3 7.1994344 0.00090447961 <2e-16 *** + + # R code for r2adj --- End diff -- Thanks for the clarification. Do you think change `x1` to `V1` would help? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19666 @facaiy Thanks for your review! I put more explanation on the design purpose of `traverseUnorderedSplits`. But, if you have better solution, no hesitate to tell me! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r149559666 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite (Intercept) 6.3022157 0.00186003388 <2e-16 *** V2 4.6982442 0.00118053980 <2e-16 *** V3 7.1994344 0.00090447961 <2e-16 *** + + # R code for r2adj --- End diff -- There may be some confusion. If you type that code, "as-is", into an R shell, it will not work. It reference a variable called `X1`, which is never defined. When we provide R code in comments like this, we intend for it to be copy and pasted into a shell and just work. So, it does not function. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r149558607 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite (Intercept) 6.3022157 0.00186003388 <2e-16 *** V2 4.6982442 0.00118053980 <2e-16 *** V3 7.1994344 0.00090447961 <2e-16 *** + + # R code for r2adj --- End diff -- @srowen it's fine in terms of functioning. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19285 **[Test build #83575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83575/testReport)** for PR 19285 at commit [`bc3ad4e`](https://github.com/apache/spark/commit/bc3ad4ea11e49b19ef4199642dbc4488f202d928). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19156 **[Test build #83574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83574/testReport)** for PR 19156 at commit [`480e80d`](https://github.com/apache/spark/commit/480e80dbb0392bebe96dc1620195a39b54f75740). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19685: [SPARK-19759][ML] not using blas in ALSModel.predict for...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19685 Have you made some test to check the performance difference for this ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19685: [SPARK-19759][ML] not using blas in ALSModel.pred...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19685#discussion_r149554146 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -289,9 +289,11 @@ class ALSModel private[ml] ( private val predict = udf { (featuresA: Seq[Float], featuresB: Seq[Float]) => if (featuresA != null && featuresB != null) { - // TODO(SPARK-19759): try dot-producting on Seqs or another non-converted type for - // potential optimization. - blas.sdot(rank, featuresA.toArray, 1, featuresB.toArray, 1) + var dotProduct = 0.0f + for(i <- 0 until rank) { --- End diff -- You should `while` instead of `for` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19661: [SPARK-22450][Core][Mllib]safely register class f...
Github user ConeyLiu commented on a diff in the pull request: https://github.com/apache/spark/pull/19661#discussion_r149553694 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -178,10 +178,40 @@ class KryoSerializer(conf: SparkConf) kryo.register(Utils.classForName("scala.collection.immutable.Map$EmptyMap$")) kryo.register(classOf[ArrayBuffer[Any]]) +// We can't load those class directly in order to avoid unnecessary jar dependencies. +// We load them safely, ignore it if the class not found. +Seq("org.apache.spark.mllib.linalg.Vector", + "org.apache.spark.mllib.linalg.DenseVector", + "org.apache.spark.mllib.linalg.SparseVector", + "org.apache.spark.mllib.linalg.Matrix", + "org.apache.spark.mllib.linalg.DenseMatrix", + "org.apache.spark.mllib.linalg.SparseMatrix", + "org.apache.spark.ml.linalg.Vector", + "org.apache.spark.ml.linalg.DenseVector", + "org.apache.spark.ml.linalg.SparseVector", + "org.apache.spark.ml.linalg.Matrix", + "org.apache.spark.ml.linalg.DenseMatrix", + "org.apache.spark.ml.linalg.SparseMatrix", + "org.apache.spark.ml.feature.Instance", + "org.apache.spark.ml.feature.OffsetInstance" +).flatMap(safeClassLoader(_)).foreach(kryo.register(_)) --- End diff -- Hi @cloud-fan , I tried the following codeï¼ ```scala flatMap(cn => Try{Utils.classForName(cn)}.toOption).foreach(kryo.register(_)) ``` and ```scala flatMap{ cn => try { val clazz = Utils.classForName(cn) Some(clazz) } catch { case _: ClassNotFoundException => None } }.foreach(kryo.register(_)) ``` Both reported the same errors: ``` Error:(198, 18) type mismatch; found : String => Iterable[Class[_$2]]( forSome { type _$2 }) required: String => scala.collection.GenTraversableOnce[B] ).flatMap{cn => Option(Utils.classForName(cn))}.foreach(kryo.register(_)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17436 **[Test build #83573 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83573/testReport)** for PR 17436 at commit [`9ce6fc0`](https://github.com/apache/spark/commit/9ce6fc0b0ad2c4c97236f0519db07b5a3600bb81). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19433 **[Test build #3983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3983/testReport)** for PR 19433 at commit [`b7e6e40`](https://github.com/apache/spark/commit/b7e6e40976612546b81d9775c194b274c146dc85). * This patch **fails to generate documentation**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org