[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19691
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP P...

2017-11-07 Thread DazhuangSu
GitHub user DazhuangSu opened a pull request:

https://github.com/apache/spark/pull/19691

[SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITION should support 
comparators

## What changes were proposed in this pull request?

This pr is inspired by @dongjoon-hyun.

quote from https://github.com/apache/spark/pull/15704 :

> **What changes were proposed in this pull request?**
This PR aims to support comparators, e.g. '<', '<=', '>', '>=', again 
in Apache Spark 2.0 for backward compatibility.
**Spark 1.6**
`scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country 
STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = [result: string]`
`scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
res1: org.apache.spark.sql.DataFrame = [result: string]`
**Spark 2.0**
`scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country 
STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = []`
`scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")`
`org.apache.spark.sql.catalyst.parser.ParseException:`
`mismatched input '<' expecting {')', ','}(line 1, pos 42)`
After this PR, it's supported.
**How was this patch tested?**
Pass the Jenkins test with a newly added testcase.


https://github.com/apache/spark/pull/16036 points out that if we use int 
literal in DROP PARTITION will fail after patching 
https://github.com/apache/spark/pull/15704.
The reason of this failing in https://github.com/apache/spark/pull/15704 is 
that AlterTableDropPartitionCommand tells BinayComparison and EqualTo with 
following code:

`private def isRangeComparison(expr: Expression): Boolean = {
 `
`expr.find(e => e.isInstanceOf[BinaryComparison] && 
!e.isInstanceOf[EqualTo]).isDefined
}`

This PR resolve this problem by telling a drop condition when parsing sqls.

## How was this patch tested?
New testcase introduced from https://github.com/apache/spark/pull/15704


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/DazhuangSu/spark SPARK-17732

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19691.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19691


commit 20f658ad8e14a94dd23bff6a8d795124d1db24e9
Author: Dylan Su 
Date:   2017-11-08T03:44:28Z

[SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITION should support 
comparators




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19690: [SPARK-22467]Added a switch to support whether `stdout_s...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19690
  
**[Test build #83588 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83588/testReport)**
 for PR 19690 at commit 
[`7b67148`](https://github.com/apache/spark/commit/7b671485e46a7e7c4fbce57b7f9e8fa66adcd82a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19690: [SPARK-22467]Added a switch to support whether `s...

2017-11-07 Thread 10110346
GitHub user 10110346 opened a pull request:

https://github.com/apache/spark/pull/19690

[SPARK-22467]Added a switch to support whether `stdout_stream` and 
`stderr_stream` output to disk

## What changes were proposed in this pull request?

We should add a switch to control the `stdout_stream` and `stdout_stream` 
output to disk.
In my environment,due to disk I/O blocking, the `stdout_stream` output is 
very slow, so it can not be timely cleaning,and this leads the executor 
process to be blocked.

## How was this patch tested?
Added a unit test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/10110346/spark stdout_err

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19690.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19690


commit 7b671485e46a7e7c4fbce57b7f9e8fa66adcd82a
Author: liuxian 
Date:   2017-11-07T09:16:48Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13206: [SPARK-15420] [SQL] Add repartition and sort to prepare ...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13206
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13206: [SPARK-15420] [SQL] Add repartition and sort to prepare ...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13206
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83583/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13206: [SPARK-15420] [SQL] Add repartition and sort to prepare ...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13206
  
**[Test build #83583 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83583/consoleFull)**
 for PR 13206 at commit 
[`a64be8a`](https://github.com/apache/spark/commit/a64be8a91ddadcd7acbbd08956f214b3c40f0dca).
 * This patch **fails PySpark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `case class DistributeAndSortOutputData(conf: CatalystConf) extends 
Rule[LogicalPlan] `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19662
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83580/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19662
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19662
  
**[Test build #83580 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83580/testReport)**
 for PR 19662 at commit 
[`d2ac83e`](https://github.com/apache/spark/commit/d2ac83e5b1c74abd422e436752f1cf91127e388a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19683
  
**[Test build #83587 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83587/testReport)**
 for PR 19683 at commit 
[`b8b5960`](https://github.com/apache/spark/commit/b8b5960f230b015896918a5465c919550af980ac).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior of time...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19607
  
**[Test build #83586 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83586/testReport)**
 for PR 19607 at commit 
[`1e0f217`](https://github.com/apache/spark/commit/1e0f21715f5ad053b5a5677a129677d498946ea3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior ...

2017-11-07 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19607#discussion_r149582142
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1629,37 +1629,82 @@ def to_arrow_type(dt):
 return arrow_type
 
 
-def _check_dataframe_localize_timestamps(pdf):
+def _check_dataframe_localize_timestamps(pdf, timezone):
 """
-Convert timezone aware timestamps to timezone-naive in local time
+Convert timezone aware timestamps to timezone-naive in the specified 
timezone or local timezone
 
 :param pdf: pandas.DataFrame
-:return pandas.DataFrame where any timezone aware columns have be 
converted to tz-naive
+:param timezone: the timezone to convert. if None then use local 
timezone
+:return pandas.DataFrame where any timezone aware columns have been 
converted to tz-naive
 """
 from pandas.api.types import is_datetime64tz_dtype
+tz = timezone or 'tzlocal()'
 for column, series in pdf.iteritems():
 # TODO: handle nested timestamps, such as 
ArrayType(TimestampType())?
 if is_datetime64tz_dtype(series.dtype):
-pdf[column] = 
series.dt.tz_convert('tzlocal()').dt.tz_localize(None)
+pdf[column] = series.dt.tz_convert(tz).dt.tz_localize(None)
 return pdf
 
 
-def _check_series_convert_timestamps_internal(s):
+def _check_series_convert_timestamps_internal(s, timezone):
 """
-Convert a tz-naive timestamp in local tz to UTC normalized for Spark 
internal storage
+Convert a tz-naive timestamp in the specified timezone or local 
timezone to UTC normalized for
+Spark internal storage
+
 :param s: a pandas.Series
+:param timezone: the timezone to convert. if None then use local 
timezone
 :return pandas.Series where if it is a timestamp, has been UTC 
normalized without a time zone
 """
 from pandas.api.types import is_datetime64_dtype, is_datetime64tz_dtype
 # TODO: handle nested timestamps, such as ArrayType(TimestampType())?
 if is_datetime64_dtype(s.dtype):
-return s.dt.tz_localize('tzlocal()').dt.tz_convert('UTC')
+tz = timezone or 'tzlocal()'
+return s.dt.tz_localize(tz).dt.tz_convert('UTC')
 elif is_datetime64tz_dtype(s.dtype):
 return s.dt.tz_convert('UTC')
 else:
 return s
 
 
+def _check_series_convert_timestamps_localize(s, timezone):
+"""
+Convert timestamp to timezone-naive in the specified timezone or local 
timezone
+
+:param s: a pandas.Series
+:param timezone: the timezone to convert. if None then use local 
timezone
+:return pandas.Series where if it is a timestamp, has been converted 
to tz-naive
+"""
+import pandas as pd
+try:
+from pandas.api.types import is_datetime64tz_dtype, 
is_datetime64_dtype
+tz = timezone or 'tzlocal()'
+# TODO: handle nested timestamps, such as 
ArrayType(TimestampType())?
+if is_datetime64tz_dtype(s.dtype):
+return s.dt.tz_convert(tz).dt.tz_localize(None)
+elif is_datetime64_dtype(s.dtype) and timezone is not None:
+# `s.dt.tz_localize('tzlocal()')` doesn't work properly when 
including NaT.
+return s.apply(lambda ts: 
ts.tz_localize('tzlocal()').tz_convert(tz).tz_localize(None)
+   if ts is not pd.NaT else pd.NaT)
+else:
+return s
+except ImportError:
--- End diff --

We will be able to remove this block if we decided to support only Pandas 
>=0.19.2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19459
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83579/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19459
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19459
  
**[Test build #83579 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83579/testReport)**
 for PR 19459 at commit 
[`99ce1e4`](https://github.com/apache/spark/commit/99ce1e44f57c411af95b1c9d9c95f35f2c1652e1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19657
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19657
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83582/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior of time...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19607
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior of time...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19607
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83578/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19657
  
**[Test build #83582 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83582/testReport)**
 for PR 19657 at commit 
[`18e238a`](https://github.com/apache/spark/commit/18e238a62d53de5a73283a741c1a9bb8230f4484).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior of time...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19607
  
**[Test build #83578 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83578/testReport)**
 for PR 19607 at commit 
[`4adb073`](https://github.com/apache/spark/commit/4adb073f8d1454fbea0742a16b6d7662e063b37a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19681
  
**[Test build #83585 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83585/testReport)**
 for PR 19681 at commit 
[`ecf293b`](https://github.com/apache/spark/commit/ecf293b31fa1b5250f484d6b2f09373e7057bbc3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19662
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19662
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83577/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19662
  
**[Test build #83577 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83577/testReport)**
 for PR 19662 at commit 
[`dd672ac`](https://github.com/apache/spark/commit/dd672ac815038f8dfd89fecb1f5b3d4668158752).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19681: [SPARK-20652][sql] Store SQL UI data in the new a...

2017-11-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19681#discussion_r149579972
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala
 ---
@@ -0,0 +1,353 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.ui
+
+import java.util.Date
+import java.util.concurrent.ConcurrentHashMap
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.HashMap
+
+import org.apache.spark.{JobExecutionStatus, SparkConf}
+import org.apache.spark.internal.Logging
+import org.apache.spark.scheduler._
+import org.apache.spark.sql.execution.SQLExecution
+import org.apache.spark.sql.execution.metric._
+import org.apache.spark.sql.internal.StaticSQLConf._
+import org.apache.spark.status.LiveEntity
+import org.apache.spark.status.config._
+import org.apache.spark.ui.SparkUI
+import org.apache.spark.util.kvstore.KVStore
+
+private[sql] class SQLAppStatusListener(
+conf: SparkConf,
+kvstore: KVStore,
+live: Boolean,
+ui: Option[SparkUI] = None)
+  extends SparkListener with Logging {
+
+  // How often to flush intermediate stage of a live execution to the 
store. When replaying logs,
+  // never flush (only do the very last write).
+  private val liveUpdatePeriodNs = if (live) 
conf.get(LIVE_ENTITY_UPDATE_PERIOD) else -1L
+
+  private val liveExecutions = new HashMap[Long, LiveExecutionData]()
+  private val stageMetrics = new HashMap[Int, LiveStageMetrics]()
+
+  private var uiInitialized = false
+
+  override def onJobStart(event: SparkListenerJobStart): Unit = {
+val executionIdString = 
event.properties.getProperty(SQLExecution.EXECUTION_ID_KEY)
+if (executionIdString == null) {
+  // This is not a job created by SQL
+  return
+}
+
+val executionId = executionIdString.toLong
+val jobId = event.jobId
+val exec = getOrCreateExecution(executionId)
+
+// Record the accumulator IDs for the stages of this job, so that the 
code that keeps
+// track of the metrics knows which accumulators to look at.
+val accumIds = exec.metrics.map(_.accumulatorId).sorted.toList
+event.stageIds.foreach { id =>
+  stageMetrics.put(id, new LiveStageMetrics(id, 0, accumIds.toArray, 
new ConcurrentHashMap()))
+}
+
+exec.jobs = exec.jobs + (jobId -> JobExecutionStatus.RUNNING)
+exec.stages = event.stageIds
+update(exec)
+  }
+
+  override def onStageSubmitted(event: SparkListenerStageSubmitted): Unit 
= {
+if (!isSQLStage(event.stageInfo.stageId)) {
+  return
+}
+
+// Reset the metrics tracking object for the new attempt.
+stageMetrics.get(event.stageInfo.stageId).foreach { metrics =>
+  metrics.taskMetrics.clear()
+  metrics.attemptId = event.stageInfo.attemptId
+}
+  }
+
+  override def onJobEnd(event: SparkListenerJobEnd): Unit = {
+liveExecutions.values.foreach { exec =>
+  if (exec.jobs.contains(event.jobId)) {
+val result = event.jobResult match {
+  case JobSucceeded => JobExecutionStatus.SUCCEEDED
+  case _ => JobExecutionStatus.FAILED
+}
+exec.jobs = exec.jobs + (event.jobId -> result)
+exec.endEvents += 1
+update(exec)
+  }
+}
+  }
+
+  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = {
+event.accumUpdates.foreach { case (taskId, stageId, attemptId, 
accumUpdates) =>
+  updateStageMetrics(stageId, attemptId, taskId, accumUpdates, false)
+}
+  }
+
+  override def onTaskEnd(event: SparkListenerTaskEnd): Unit = {
+if (!isSQLStage(event.stageId)) {
+  return
+}
+
+val info = event.taskInfo
+// SPARK-20342. If processing events from a live ap

[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19689
  
**[Test build #83584 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83584/testReport)**
 for PR 19689 at commit 
[`f0c6399`](https://github.com/apache/spark/commit/f0c639909d7b1638cdf2de5c3684d7de1c375436).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19681: [SPARK-20652][sql] Store SQL UI data in the new a...

2017-11-07 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/19681#discussion_r149578181
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala
 ---
@@ -40,7 +40,7 @@ private[sql] class SQLAppStatusListener(
 ui: Option[SparkUI] = None)
   extends SparkListener with Logging {
 
-  // How often to flush intermediate statge of a live execution to the 
store. When replaying logs,
+  // How often to flush intermediate stage of a live execution to the 
store. When replaying logs,
--- End diff --

err, was this supposed to be "state"?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19681: [SPARK-20652][sql] Store SQL UI data in the new a...

2017-11-07 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/19681#discussion_r149578074
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala
 ---
@@ -0,0 +1,353 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.ui
+
+import java.util.Date
+import java.util.concurrent.ConcurrentHashMap
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.HashMap
+
+import org.apache.spark.{JobExecutionStatus, SparkConf}
+import org.apache.spark.internal.Logging
+import org.apache.spark.scheduler._
+import org.apache.spark.sql.execution.SQLExecution
+import org.apache.spark.sql.execution.metric._
+import org.apache.spark.sql.internal.StaticSQLConf._
+import org.apache.spark.status.LiveEntity
+import org.apache.spark.status.config._
+import org.apache.spark.ui.SparkUI
+import org.apache.spark.util.kvstore.KVStore
+
+private[sql] class SQLAppStatusListener(
+conf: SparkConf,
+kvstore: KVStore,
+live: Boolean,
+ui: Option[SparkUI] = None)
+  extends SparkListener with Logging {
+
+  // How often to flush intermediate stage of a live execution to the 
store. When replaying logs,
+  // never flush (only do the very last write).
+  private val liveUpdatePeriodNs = if (live) 
conf.get(LIVE_ENTITY_UPDATE_PERIOD) else -1L
+
+  private val liveExecutions = new HashMap[Long, LiveExecutionData]()
+  private val stageMetrics = new HashMap[Int, LiveStageMetrics]()
+
+  private var uiInitialized = false
+
+  override def onJobStart(event: SparkListenerJobStart): Unit = {
+val executionIdString = 
event.properties.getProperty(SQLExecution.EXECUTION_ID_KEY)
+if (executionIdString == null) {
+  // This is not a job created by SQL
+  return
+}
+
+val executionId = executionIdString.toLong
+val jobId = event.jobId
+val exec = getOrCreateExecution(executionId)
+
+// Record the accumulator IDs for the stages of this job, so that the 
code that keeps
+// track of the metrics knows which accumulators to look at.
+val accumIds = exec.metrics.map(_.accumulatorId).sorted.toList
+event.stageIds.foreach { id =>
+  stageMetrics.put(id, new LiveStageMetrics(id, 0, accumIds.toArray, 
new ConcurrentHashMap()))
+}
+
+exec.jobs = exec.jobs + (jobId -> JobExecutionStatus.RUNNING)
+exec.stages = event.stageIds
+update(exec)
+  }
+
+  override def onStageSubmitted(event: SparkListenerStageSubmitted): Unit 
= {
+if (!isSQLStage(event.stageInfo.stageId)) {
+  return
+}
+
+// Reset the metrics tracking object for the new attempt.
+stageMetrics.get(event.stageInfo.stageId).foreach { metrics =>
+  metrics.taskMetrics.clear()
+  metrics.attemptId = event.stageInfo.attemptId
+}
+  }
+
+  override def onJobEnd(event: SparkListenerJobEnd): Unit = {
+liveExecutions.values.foreach { exec =>
+  if (exec.jobs.contains(event.jobId)) {
+val result = event.jobResult match {
+  case JobSucceeded => JobExecutionStatus.SUCCEEDED
+  case _ => JobExecutionStatus.FAILED
+}
+exec.jobs = exec.jobs + (event.jobId -> result)
+exec.endEvents += 1
+update(exec)
+  }
+}
+  }
+
+  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = {
+event.accumUpdates.foreach { case (taskId, stageId, attemptId, 
accumUpdates) =>
+  updateStageMetrics(stageId, attemptId, taskId, accumUpdates, false)
+}
+  }
+
+  override def onTaskEnd(event: SparkListenerTaskEnd): Unit = {
+if (!isSQLStage(event.stageId)) {
+  return
+}
+
+val info = event.taskInfo
+// SPARK-20342. If processing events from a live ap

[GitHub] spark issue #13206: [SPARK-15420] [SQL] Add repartition and sort to prepare ...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13206
  
**[Test build #83583 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83583/consoleFull)**
 for PR 13206 at commit 
[`a64be8a`](https://github.com/apache/spark/commit/a64be8a91ddadcd7acbbd08956f214b3c40f0dca).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19678: [SPARK-20646][core] Port executors page to new UI...

2017-11-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19678


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...

2017-11-07 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/19678
  
merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19557: [SPARK-22281][SPARKR] Handle R method breaking si...

2017-11-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19557


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19557: [SPARK-22281][SPARKR] Handle R method breaking signature...

2017-11-07 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19557
  
merged to master/2.2


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19285
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check for vers...

2017-11-07 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19619
  
merged


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check f...

2017-11-07 Thread felixcheung
Github user felixcheung closed the pull request at:

https://github.com/apache/spark/pull/19619


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19285
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83575/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check f...

2017-11-07 Thread felixcheung
Github user felixcheung closed the pull request at:

https://github.com/apache/spark/pull/19620


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19657
  
**[Test build #83582 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83582/testReport)**
 for PR 19657 at commit 
[`18e238a`](https://github.com/apache/spark/commit/18e238a62d53de5a73283a741c1a9bb8230f4484).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check for vers...

2017-11-07 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19620
  
merged


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19657
  
Yup, I just checked it too and was writing a comment .. The current change 
should pass :).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19285
  
**[Test build #83575 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83575/testReport)**
 for PR 19285 at commit 
[`bc3ad4e`](https://github.com/apache/spark/commit/bc3ad4ea11e49b19ef4199642dbc4488f202d928).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19689
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83581/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19689
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19689
  
**[Test build #83581 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83581/testReport)**
 for PR 19689 at commit 
[`ac539cd`](https://github.com/apache/spark/commit/ac539cd0e761193d9a665d8ccb19a8fba5dd504b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17436
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83573/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17436
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17436
  
**[Test build #83573 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83573/testReport)**
 for PR 17436 at commit 
[`9ce6fc0`](https://github.com/apache/spark/commit/9ce6fc0b0ad2c4c97236f0519db07b5a3600bb81).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19678
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19678
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83572/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19678
  
**[Test build #83572 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83572/testReport)**
 for PR 19678 at commit 
[`c7123d9`](https://github.com/apache/spark/commit/c7123d9c8d3934c482cd89ea820b2958f4dbbe0a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19648: [SPARK-14516][ML][FOLLOW-UP] Move ClusteringEvalu...

2017-11-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19648


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19648: [SPARK-14516][ML][FOLLOW-UP] Move ClusteringEvaluatorSui...

2017-11-07 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/19648
  
Merged into master, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19689
  
**[Test build #83581 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83581/testReport)**
 for PR 19689 at commit 
[`ac539cd`](https://github.com/apache/spark/commit/ac539cd0e761193d9a665d8ccb19a8fba5dd504b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...

2017-11-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19689
  

The screenshot for running `sql("select * from range(10)").foreach(a => 
Unit)` on spark-shell:

https://user-images.githubusercontent.com/68855/32531135-1e60d544-c47d-11e7-88d6-627ef77d0b80.png";>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19689: [SPARK-22462][SQL] Make rdd-based actions in Data...

2017-11-07 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/19689

[SPARK-22462][SQL] Make rdd-based actions in Dataset trackable in SQL UI

## What changes were proposed in this pull request?

For the few Dataset actions such as `foreach`, currently no SQL metrics are 
visible in the SQL tab of SparkUI. It is because it binds wrongly to Dataset's 
`QueryExecution`. As the actions directly evaluate on the RDD which has 
individual `QueryExecution`, to show correct SQL metrics on UI, we should bind 
to RDD's `QueryExecution`.

## How was this patch tested?

Manually test. Screenshot is attached in the PR.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-22462

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19689.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19689


commit ac539cd0e761193d9a665d8ccb19a8fba5dd504b
Author: Liang-Chi Hsieh 
Date:   2017-11-07T10:54:14Z

Make rdd-based actions trackable in UI.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19687: [SPARK-19644][SQL]Clean up Scala reflection garbage afte...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19687
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83571/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19687: [SPARK-19644][SQL]Clean up Scala reflection garbage afte...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19687
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19687: [SPARK-19644][SQL]Clean up Scala reflection garbage afte...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19687
  
**[Test build #83571 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83571/testReport)**
 for PR 19687 at commit 
[`c03811f`](https://github.com/apache/spark/commit/c03811ff006058987fa8d5fb9f7d097b9acc9ac5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19662
  
**[Test build #83580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83580/testReport)**
 for PR 19662 at commit 
[`d2ac83e`](https://github.com/apache/spark/commit/d2ac83e5b1c74abd422e436752f1cf91127e388a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel...

2017-11-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19662#discussion_r149568133
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala ---
@@ -126,4 +126,25 @@ class VectorAssemblerSuite
   .setOutputCol("myOutputCol")
 testDefaultReadWrite(t)
   }
+
+  test("VectorAssembler's UDF should not apply on filtered data") {
--- End diff --

Ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel...

2017-11-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19662#discussion_r149567769
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala ---
@@ -126,4 +126,25 @@ class VectorAssemblerSuite
   .setOutputCol("myOutputCol")
 testDefaultReadWrite(t)
   }
+
+  test("VectorAssembler's UDF should not apply on filtered data") {
--- End diff --

mark the [SPARK-22446] on the test name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...

2017-11-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19666#discussion_r149567340
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -631,6 +614,42 @@ class RandomForestSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 val expected = Map(0 -> 1.0 / 3.0, 2 -> 2.0 / 3.0)
 assert(mapToVec(map.toMap) ~== mapToVec(expected) relTol 0.01)
   }
+
+  test("traverseUnorderedSplits") {
+
--- End diff --

So how to test all possible splits to make sure the generated splits are 
all correct ? If tree generated, only best split is remained.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19459
  
**[Test build #83579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83579/testReport)**
 for PR 19459 at commit 
[`99ce1e4`](https://github.com/apache/spark/commit/99ce1e44f57c411af95b1c9d9c95f35f2c1652e1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...

2017-11-07 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19459
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...

2017-11-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19664#discussion_r149565144
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -214,11 +215,13 @@ case class Invoke(
   override def eval(input: InternalRow): Any =
 throw new UnsupportedOperationException("Only code-generated 
evaluation is supported.")
 
+  private lazy val encodedFunctionName = 
TermName(functionName).encodedName.toString
--- End diff --

Since we use `Invoke` to access field(s) in object, this can be an issue. I 
didn't found `StaticInvoke` used similarly. So it should be fine. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19663
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83576/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19663
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19663
  
**[Test build #83576 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83576/testReport)**
 for PR 19663 at commit 
[`f8c1f63`](https://github.com/apache/spark/commit/f8c1f63944c602a00802356f94788464320ffa3f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...

2017-11-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19664#discussion_r149564523
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -335,4 +338,17 @@ class ScalaReflectionSuite extends SparkFunSuite {
 assert(linkedHashMapDeserializer.dataType == 
ObjectType(classOf[LHMap[_, _]]))
   }
 
+  test("SPARK-22442: Generate correct field names for special characters") 
{
+val serializer = serializerFor[SpecialCharAsFieldData](BoundReference(
+  0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false))
+val deserializer = deserializerFor[SpecialCharAsFieldData]
+assert(serializer.dataType(0).name == "field.1")
+assert(serializer.dataType(1).name == "field 2")
+
+val argumentsFields = 
deserializer.asInstanceOf[NewInstance].arguments.flatMap { _.collect {
+  case UpCast(u: UnresolvedAttribute, _, _) => u.name
+}}
+assert(argumentsFields(0) == "`field.1`")
--- End diff --

We need to deliberately wrap backticks around a field name such as 
`field.1` because of the dot character. Otherwise `UnresolvedAttribute` will 
parse it as two name parts `Seq("field", "1")` and then fail resolving later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...

2017-11-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19664#discussion_r149564330
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -214,11 +215,13 @@ case class Invoke(
   override def eval(input: InternalRow): Any =
 throw new UnsupportedOperationException("Only code-generated 
evaluation is supported.")
 
+  private lazy val encodedFunctionName = 
TermName(functionName).encodedName.toString
--- End diff --

Maybe, although I didn't have concrete case causing the issue for now. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...

2017-11-07 Thread ArtRand
Github user ArtRand commented on a diff in the pull request:

https://github.com/apache/spark/pull/19272#discussion_r149564294
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -213,6 +216,14 @@ private[spark] class 
MesosCoarseGrainedSchedulerBackend(
   sc.conf.getOption("spark.mesos.driver.frameworkId").map(_ + suffix)
 )
 
+// check that the credentials are defined, even though it's likely 
that auth would have failed
+// already if you've made it this far, then start the token renewer
+if (hadoopDelegationTokens.isDefined) {
--- End diff --

I may have spoke too soon, there might be a way..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...

2017-11-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19662
  
@WeichenXu123 I did a scan. Currently I only found `VectorAssembler`'s udf 
may have similar issue. Fixed and added test for it too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19662: [SPARK-22446][SQL][ML] Declare StringIndexerModel indexe...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19662
  
**[Test build #83577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83577/testReport)**
 for PR 19662 at commit 
[`dd672ac`](https://github.com/apache/spark/commit/dd672ac815038f8dfd89fecb1f5b3d4668158752).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior of time...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19607
  
**[Test build #83578 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83578/testReport)**
 for PR 19607 at commit 
[`4adb073`](https://github.com/apache/spark/commit/4adb073f8d1454fbea0742a16b6d7662e063b37a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19156
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19156
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83574/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19156
  
**[Test build #83574 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83574/testReport)**
 for PR 19156 at commit 
[`480e80d`](https://github.com/apache/spark/commit/480e80dbb0392bebe96dc1620195a39b54f75740).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19663
  
**[Test build #83576 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83576/testReport)**
 for PR 19663 at commit 
[`f8c1f63`](https://github.com/apache/spark/commit/f8c1f63944c602a00802356f94788464320ffa3f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR while c...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19688
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hba...

2017-11-07 Thread yaooqinn
Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/19663#discussion_r149561925
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -687,6 +687,20 @@ private[spark] class Client(
   private def createConfArchive(): File = {
 val hadoopConfFiles = new HashMap[String, File]()
 
+// SPARK_CONF_DIR shows up in the classpath before 
HADOOP_CONF_DIR/YARN_CONF_DIR
+val localConfDir = System.getProperty("SPARK_CONF_DIR",
+  System.getProperty("SPARK_HOME") + File.separator + "conf")
+val dir = new File(localConfDir)
+if (dir.isDirectory) {
+  val files = dir.listFiles(new FileFilter {
+override def accept(pathname: File): Boolean = {
+  pathname.isFile && pathname.getName.endsWith("xml")
+}
+  })
+  files.foreach { f => hadoopConfFiles(f.getName) = f }
+}
+
+// Ensure HADOOP_CONF_DIR/YARN_CONF_DIR not overriding existing files
--- End diff --

ok, i'd remove it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-11-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19565
  
ok I agree this change. @jkbradley Can you take a look ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hba...

2017-11-07 Thread yaooqinn
Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/19663#discussion_r149561877
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -687,6 +687,20 @@ private[spark] class Client(
   private def createConfArchive(): File = {
 val hadoopConfFiles = new HashMap[String, File]()
 
+// SPARK_CONF_DIR shows up in the classpath before 
HADOOP_CONF_DIR/YARN_CONF_DIR
+val localConfDir = System.getProperty("SPARK_CONF_DIR",
--- End diff --

not exactly till now  , plz check https://github.com/apache/spark/pull/19688


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hba...

2017-11-07 Thread yaooqinn
Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/19663#discussion_r149561888
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -687,6 +687,20 @@ private[spark] class Client(
   private def createConfArchive(): File = {
 val hadoopConfFiles = new HashMap[String, File]()
 
+// SPARK_CONF_DIR shows up in the classpath before 
HADOOP_CONF_DIR/YARN_CONF_DIR
+val localConfDir = System.getProperty("SPARK_CONF_DIR",
+  System.getProperty("SPARK_HOME") + File.separator + "conf")
+val dir = new File(localConfDir)
+if (dir.isDirectory) {
+  val files = dir.listFiles(new FileFilter {
+override def accept(pathname: File): Boolean = {
+  pathname.isFile && pathname.getName.endsWith("xml")
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR ...

2017-11-07 Thread yaooqinn
GitHub user yaooqinn opened a pull request:

https://github.com/apache/spark/pull/19688

[SPARK-22466][Spark Submit]export SPARK_CONF_DIR while conf is default

## What changes were proposed in this pull request?
### Before

```
Kent@KentsMacBookPro  
~/Documents/spark-packages/spark-2.3.0-SNAPSHOT-bin-master  bin/spark-shell 
--master local
Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
17/11/08 10:28:44 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
17/11/08 10:28:45 WARN Utils: Service 'SparkUI' could not bind on port 
4040. Attempting port 4041.
Spark context Web UI available at http://169.254.168.63:4041
Spark context available as 'sc' (master = local, app id = 
local-1510108125770).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
  /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 
1.8.0_65)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sys.env.get("SPARK_CONF_DIR")
res0: Option[String] = None
```

### After 

```
scala> sys.env.get("SPARK_CONF_DIR")
res0: Option[String] = Some(/Users/Kent/Documents/spark/conf)
```
## How was this patch tested?

@vanzin 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yaooqinn/spark SPARK-22466

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19688.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19688


commit 19ac61cd6d8b4cca295a1f0d2f2988ee3ac20d8c
Author: Kent Yao 
Date:   2017-11-08T02:30:01Z

export SPARK_CONF_DIR while conf is default




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...

2017-11-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19666#discussion_r149561550
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -741,17 +678,43 @@ private[spark] object RandomForest extends Logging {
   (splits(featureIndex)(bestFeatureSplitIndex), 
bestFeatureGainStats)
 } else if (binAggregates.metadata.isUnordered(featureIndex)) {
   // Unordered categorical feature
-  val leftChildOffset = 
binAggregates.getFeatureOffset(featureIndexIdx)
-  val (bestFeatureSplitIndex, bestFeatureGainStats) =
-Range(0, numSplits).map { splitIndex =>
-  val leftChildStats = 
binAggregates.getImpurityCalculator(leftChildOffset, splitIndex)
-  val rightChildStats = 
binAggregates.getParentImpurityCalculator()
-.subtract(leftChildStats)
+  val numBins = binAggregates.metadata.numBins(featureIndex)
+  val featureOffset = 
binAggregates.getFeatureOffset(featureIndexIdx)
+
+  val binStatsArray = Array.tabulate(numBins) { binIndex =>
+binAggregates.getImpurityCalculator(featureOffset, binIndex)
+  }
+  val parentStats = binAggregates.getParentImpurityCalculator()
+
+  var bestGain = Double.NegativeInfinity
+  var bestSet: BitSet = null
+  var bestLeftChildStats: ImpurityCalculator = null
+  var bestRightChildStats: ImpurityCalculator = null
+
+  traverseUnorderedSplits[ImpurityCalculator](numBins, null,
+(stats, binIndex) => {
+  val binStats = binStatsArray(binIndex)
+  if (stats == null) {
+binStats
+  } else {
+stats.copy.add(binStats)
+  }
+},
+(set, leftChildStats) => {
+  val rightChildStats = 
parentStats.copy.subtract(leftChildStats)
   gainAndImpurityStats = 
calculateImpurityStats(gainAndImpurityStats,
 leftChildStats, rightChildStats, binAggregates.metadata)
-  (splitIndex, gainAndImpurityStats)
-}.maxBy(_._2.gain)
-  (splits(featureIndex)(bestFeatureSplitIndex), 
bestFeatureGainStats)
+  if (gainAndImpurityStats.gain > bestGain) {
+bestGain = gainAndImpurityStats.gain
+bestSet = set | new BitSet(numBins) // copy set
--- End diff --

The class do not support `copy` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19666
  
Also cc @smurching Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-07 Thread tengpeng
Github user tengpeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/19638#discussion_r149560345
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala 
---
@@ -764,13 +764,17 @@ class LinearRegressionSuite
   (Intercept) 6.3022157  0.00186003388   <2e-16 ***
   V2  4.6982442  0.00118053980   <2e-16 ***
   V3  7.1994344  0.00090447961   <2e-16 ***
+
+  # R code for r2adj
--- End diff --

Thanks for the clarification. Do you think change `x1` to `V1` would help?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19666
  
@facaiy Thanks for your review! I put more explanation on the design 
purpose of `traverseUnorderedSplits`. But, if you have better solution, no 
hesitate to tell me!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-07 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19638#discussion_r149559666
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala 
---
@@ -764,13 +764,17 @@ class LinearRegressionSuite
   (Intercept) 6.3022157  0.00186003388   <2e-16 ***
   V2  4.6982442  0.00118053980   <2e-16 ***
   V3  7.1994344  0.00090447961   <2e-16 ***
+
+  # R code for r2adj
--- End diff --

There may be some confusion. If you type that code, "as-is", into an R 
shell, it will not work. It reference a variable called `X1`, which is never 
defined. When we provide R code in comments like this, we intend for it to be 
copy and pasted into a shell and just work. So, it does not function.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-07 Thread tengpeng
Github user tengpeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/19638#discussion_r149558607
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala 
---
@@ -764,13 +764,17 @@ class LinearRegressionSuite
   (Intercept) 6.3022157  0.00186003388   <2e-16 ***
   V2  4.6982442  0.00118053980   <2e-16 ***
   V3  7.1994344  0.00090447961   <2e-16 ***
+
+  # R code for r2adj
--- End diff --

@srowen it's fine in terms of functioning. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19285
  
**[Test build #83575 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83575/testReport)**
 for PR 19285 at commit 
[`bc3ad4e`](https://github.com/apache/spark/commit/bc3ad4ea11e49b19ef4199642dbc4488f202d928).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19156
  
**[Test build #83574 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83574/testReport)**
 for PR 19156 at commit 
[`480e80d`](https://github.com/apache/spark/commit/480e80dbb0392bebe96dc1620195a39b54f75740).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19685: [SPARK-19759][ML] not using blas in ALSModel.predict for...

2017-11-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19685
  
Have you made some test to check the performance difference for this ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19685: [SPARK-19759][ML] not using blas in ALSModel.pred...

2017-11-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19685#discussion_r149554146
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala 
---
@@ -289,9 +289,11 @@ class ALSModel private[ml] (
 
   private val predict = udf { (featuresA: Seq[Float], featuresB: 
Seq[Float]) =>
 if (featuresA != null && featuresB != null) {
-  // TODO(SPARK-19759): try dot-producting on Seqs or another 
non-converted type for
-  // potential optimization.
-  blas.sdot(rank, featuresA.toArray, 1, featuresB.toArray, 1)
+  var dotProduct = 0.0f
+  for(i <- 0 until rank) {
--- End diff --

You should `while` instead of `for`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19661: [SPARK-22450][Core][Mllib]safely register class f...

2017-11-07 Thread ConeyLiu
Github user ConeyLiu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19661#discussion_r149553694
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -178,10 +178,40 @@ class KryoSerializer(conf: SparkConf)
 
kryo.register(Utils.classForName("scala.collection.immutable.Map$EmptyMap$"))
 kryo.register(classOf[ArrayBuffer[Any]])
 
+// We can't load those class directly in order to avoid unnecessary 
jar dependencies.
+// We load them safely, ignore it if the class not found.
+Seq("org.apache.spark.mllib.linalg.Vector",
+  "org.apache.spark.mllib.linalg.DenseVector",
+  "org.apache.spark.mllib.linalg.SparseVector",
+  "org.apache.spark.mllib.linalg.Matrix",
+  "org.apache.spark.mllib.linalg.DenseMatrix",
+  "org.apache.spark.mllib.linalg.SparseMatrix",
+  "org.apache.spark.ml.linalg.Vector",
+  "org.apache.spark.ml.linalg.DenseVector",
+  "org.apache.spark.ml.linalg.SparseVector",
+  "org.apache.spark.ml.linalg.Matrix",
+  "org.apache.spark.ml.linalg.DenseMatrix",
+  "org.apache.spark.ml.linalg.SparseMatrix",
+  "org.apache.spark.ml.feature.Instance",
+  "org.apache.spark.ml.feature.OffsetInstance"
+).flatMap(safeClassLoader(_)).foreach(kryo.register(_))
--- End diff --

Hi @cloud-fan , I tried the following code:
```scala
flatMap(cn => 
Try{Utils.classForName(cn)}.toOption).foreach(kryo.register(_))
```
and 
```scala
flatMap{ cn =>
  try {
val clazz = Utils.classForName(cn)
Some(clazz)
  } catch {
case _: ClassNotFoundException => None
  }
}.foreach(kryo.register(_))
```

Both reported the same errors:
```
Error:(198, 18) type mismatch;
 found   : String => Iterable[Class[_$2]]( forSome { type _$2 })
 required: String => scala.collection.GenTraversableOnce[B]
).flatMap{cn => 
Option(Utils.classForName(cn))}.foreach(kryo.register(_))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17436
  
**[Test build #83573 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83573/testReport)**
 for PR 17436 at commit 
[`9ce6fc0`](https://github.com/apache/spark/commit/9ce6fc0b0ad2c4c97236f0519db07b5a3600bb81).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19433
  
**[Test build #3983 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3983/testReport)**
 for PR 19433 at commit 
[`b7e6e40`](https://github.com/apache/spark/commit/b7e6e40976612546b81d9775c194b274c146dc85).
 * This patch **fails to generate documentation**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >