[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

2018-06-10 Thread dongjinleekr
Github user dongjinleekr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21501#discussion_r194296251
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -2610,6 +2610,9 @@ def setParams(self, inputCol=None, outputCol=None, 
stopWords=None, caseSensitive
 Sets params for this StopWordRemover.
 """
 kwargs = self._input_kwargs
+if locale is None:
+sc = SparkContext._active_spark_context
+kwargs['locale'] = 
sc._gateway.jvm.org.spark.ml.util.LocaleUtils.getDefaultLocale()
--- End diff --

@viirya You mean... `locale=SparkContext._active_spark_context.(...)`over 
`locale=None` with ugly if statement, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21514: [SPARK-22860] [Core] - hide key password from lin...

2018-06-10 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21514#discussion_r194296152
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
 ---
@@ -100,7 +100,7 @@ private[spark] class StandaloneSchedulerBackend(
 val sparkJavaOpts = Utils.sparkJavaOpts(conf, 
SparkConf.isExecutorStartupConf)
 val javaOpts = sparkJavaOpts ++ extraJavaOpts
 val command = 
Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
-  args, sc.executorEnvs, classPathEntries ++ testingClassPath, 
libraryPathEntries, javaOpts)
+  args, sc.executorEnvs, classPathEntries ++ testingClassPath, 
libraryPathEntries, 
javaOpts.filterNot(_.startsWith("-Dspark.ssl.keyStorePassword")).filterNot(_.startsWith("-Dspark.ssl.keyPassword")))
--- End diff --

If you really have to do this, I'd have:
```
javaOpts.filterNot { opt =>
opt.startsWith("-Dspark.ssl.keyStorePassword") || 
opt.startsWith("-Dspark.ssl.keyPassword")
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21514: [SPARK-22860] [Core] - hide key password from linux ps l...

2018-06-10 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21514
  
Have you tried the config "spark.redaction.regex" ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21506: [SPARK-24485][SS] Measure and log elapsed time for files...

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21506
  
**[Test build #91649 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91649/testReport)**
 for PR 21506 at commit 
[`3d0e23f`](https://github.com/apache/spark/commit/3d0e23f7460976a33d6f86178d04f04e488bfaa8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21506: [SPARK-24485][SS] Measure and log elapsed time fo...

2018-06-10 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/spark/pull/21506#discussion_r194295068
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
 ---
@@ -280,38 +278,49 @@ private[state] class HDFSBackedStateStoreProvider 
extends StateStoreProvider wit
 if (loadedCurrentVersionMap.isDefined) {
   return loadedCurrentVersionMap.get
 }
-val snapshotCurrentVersionMap = readSnapshotFile(version)
-if (snapshotCurrentVersionMap.isDefined) {
-  synchronized { loadedMaps.put(version, 
snapshotCurrentVersionMap.get) }
-  return snapshotCurrentVersionMap.get
-}
 
-// Find the most recent map before this version that we can.
-// [SPARK-22305] This must be done iteratively to avoid stack overflow.
-var lastAvailableVersion = version
-var lastAvailableMap: Option[MapType] = None
-while (lastAvailableMap.isEmpty) {
-  lastAvailableVersion -= 1
+logWarning(s"The state for version $version doesn't exist in 
loadedMaps. " +
+  "Reading snapshot file and delta files if needed..." +
+  "Note that this is normal for the first batch of starting query.")
 
-  if (lastAvailableVersion <= 0) {
-// Use an empty map for versions 0 or less.
-lastAvailableMap = Some(new MapType)
-  } else {
-lastAvailableMap =
-  synchronized { loadedMaps.get(lastAvailableVersion) }
-.orElse(readSnapshotFile(lastAvailableVersion))
+val (result, elapsedMs) = Utils.timeTakenMs {
+  val snapshotCurrentVersionMap = readSnapshotFile(version)
+  if (snapshotCurrentVersionMap.isDefined) {
+synchronized { loadedMaps.put(version, 
snapshotCurrentVersionMap.get) }
+return snapshotCurrentVersionMap.get
+  }
+
+  // Find the most recent map before this version that we can.
+  // [SPARK-22305] This must be done iteratively to avoid stack 
overflow.
+  var lastAvailableVersion = version
+  var lastAvailableMap: Option[MapType] = None
+  while (lastAvailableMap.isEmpty) {
+lastAvailableVersion -= 1
+
+if (lastAvailableVersion <= 0) {
+  // Use an empty map for versions 0 or less.
+  lastAvailableMap = Some(new MapType)
+} else {
+  lastAvailableMap =
+synchronized { loadedMaps.get(lastAvailableVersion) }
+  .orElse(readSnapshotFile(lastAvailableVersion))
+}
+  }
+
+  // Load all the deltas from the version after the last available one 
up to the target version.
+  // The last available version is the one with a full snapshot, so it 
doesn't need deltas.
+  val resultMap = new MapType(lastAvailableMap.get)
+  for (deltaVersion <- lastAvailableVersion + 1 to version) {
+updateFromDeltaFile(deltaVersion, resultMap)
   }
-}
 
-// Load all the deltas from the version after the last available one 
up to the target version.
-// The last available version is the one with a full snapshot, so it 
doesn't need deltas.
-val resultMap = new MapType(lastAvailableMap.get)
-for (deltaVersion <- lastAvailableVersion + 1 to version) {
-  updateFromDeltaFile(deltaVersion, resultMap)
+  synchronized { loadedMaps.put(version, resultMap) }
+  resultMap
 }
 
-synchronized { loadedMaps.put(version, resultMap) }
-resultMap
+logWarning(s"Loading state for $version takes $elapsedMs ms.")
--- End diff --

Changed log level to DEBUG.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21505: [SPARK-24457][SQL] Improving performance of strin...

2018-06-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21505#discussion_r194294961
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -125,7 +125,6 @@ object DateTimeUtils {
   .getOrElseUpdate(timeZone, {
 Calendar.getInstance(timeZone)
   })
-c.clear()
--- End diff --

Seems `setTimeInMillis` can result in all fields set. If so, `clear` is 
redundant.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21501
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21501
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91648/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21501
  
**[Test build #91648 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91648/testReport)**
 for PR 21501 at commit 
[`b4249c3`](https://github.com/apache/spark/commit/b4249c342a92dc840a1f1d5290c24a5fe165417d).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21495: [SPARK-24418][Build] Upgrade Scala to 2.11.12 and...

2018-06-10 Thread som-snytt
Github user som-snytt commented on a diff in the pull request:

https://github.com/apache/spark/pull/21495#discussion_r194294485
  
--- Diff: 
repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoopInterpreter.scala
 ---
@@ -21,8 +21,22 @@ import scala.collection.mutable
 import scala.tools.nsc.Settings
 import scala.tools.nsc.interpreter._
 
-class SparkILoopInterpreter(settings: Settings, out: JPrintWriter) extends 
IMain(settings, out) {
-  self =>
+class SparkILoopInterpreter(settings: Settings, out: JPrintWriter, 
initializeSpark: () => Unit)
+extends IMain(settings, out) { self =>
--- End diff --

It's definitely two spaces after a period. I've been wanting to make that 
joke, but held off.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21505: [SPARK-24457][SQL] Improving performance of strin...

2018-06-10 Thread ssonker
Github user ssonker commented on a diff in the pull request:

https://github.com/apache/spark/pull/21505#discussion_r194294182
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -125,7 +125,6 @@ object DateTimeUtils {
   .getOrElseUpdate(timeZone, {
 Calendar.getInstance(timeZone)
   })
-c.clear()
--- End diff --

@viirya @kiszk Do you agree with this commit?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

2018-06-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21501#discussion_r194293510
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -2610,6 +2610,9 @@ def setParams(self, inputCol=None, outputCol=None, 
stopWords=None, caseSensitive
 Sets params for this StopWordRemover.
 """
 kwargs = self._input_kwargs
+if locale is None:
+sc = SparkContext._active_spark_context
+kwargs['locale'] = 
sc._gateway.jvm.org.spark.ml.util.LocaleUtils.getDefaultLocale()
--- End diff --

We can keep this default local, and use it many times instead of call to 
JVM form Python every time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21506: [SPARK-24485][SS] Measure and log elapsed time fo...

2018-06-10 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/spark/pull/21506#discussion_r194293481
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
 ---
@@ -280,38 +278,49 @@ private[state] class HDFSBackedStateStoreProvider 
extends StateStoreProvider wit
 if (loadedCurrentVersionMap.isDefined) {
   return loadedCurrentVersionMap.get
 }
-val snapshotCurrentVersionMap = readSnapshotFile(version)
-if (snapshotCurrentVersionMap.isDefined) {
-  synchronized { loadedMaps.put(version, 
snapshotCurrentVersionMap.get) }
-  return snapshotCurrentVersionMap.get
-}
 
-// Find the most recent map before this version that we can.
-// [SPARK-22305] This must be done iteratively to avoid stack overflow.
-var lastAvailableVersion = version
-var lastAvailableMap: Option[MapType] = None
-while (lastAvailableMap.isEmpty) {
-  lastAvailableVersion -= 1
+logWarning(s"The state for version $version doesn't exist in 
loadedMaps. " +
+  "Reading snapshot file and delta files if needed..." +
+  "Note that this is normal for the first batch of starting query.")
 
-  if (lastAvailableVersion <= 0) {
-// Use an empty map for versions 0 or less.
-lastAvailableMap = Some(new MapType)
-  } else {
-lastAvailableMap =
-  synchronized { loadedMaps.get(lastAvailableVersion) }
-.orElse(readSnapshotFile(lastAvailableVersion))
+val (result, elapsedMs) = Utils.timeTakenMs {
+  val snapshotCurrentVersionMap = readSnapshotFile(version)
+  if (snapshotCurrentVersionMap.isDefined) {
+synchronized { loadedMaps.put(version, 
snapshotCurrentVersionMap.get) }
+return snapshotCurrentVersionMap.get
+  }
+
+  // Find the most recent map before this version that we can.
+  // [SPARK-22305] This must be done iteratively to avoid stack 
overflow.
+  var lastAvailableVersion = version
+  var lastAvailableMap: Option[MapType] = None
+  while (lastAvailableMap.isEmpty) {
+lastAvailableVersion -= 1
+
+if (lastAvailableVersion <= 0) {
+  // Use an empty map for versions 0 or less.
+  lastAvailableMap = Some(new MapType)
+} else {
+  lastAvailableMap =
+synchronized { loadedMaps.get(lastAvailableVersion) }
+  .orElse(readSnapshotFile(lastAvailableVersion))
+}
+  }
+
+  // Load all the deltas from the version after the last available one 
up to the target version.
+  // The last available version is the one with a full snapshot, so it 
doesn't need deltas.
+  val resultMap = new MapType(lastAvailableMap.get)
+  for (deltaVersion <- lastAvailableVersion + 1 to version) {
+updateFromDeltaFile(deltaVersion, resultMap)
   }
-}
 
-// Load all the deltas from the version after the last available one 
up to the target version.
-// The last available version is the one with a full snapshot, so it 
doesn't need deltas.
-val resultMap = new MapType(lastAvailableMap.get)
-for (deltaVersion <- lastAvailableVersion + 1 to version) {
-  updateFromDeltaFile(deltaVersion, resultMap)
+  synchronized { loadedMaps.put(version, resultMap) }
+  resultMap
 }
 
-synchronized { loadedMaps.put(version, resultMap) }
-resultMap
+logWarning(s"Loading state for $version takes $elapsedMs ms.")
--- End diff --

I just thought about making a pair between warning message above and this, 
but once we are guiding end users to turn on DEBUG level to see information 
regarding addition latencies, turning this to DEBUG would be also OK.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21506: [SPARK-24485][SS] Measure and log elapsed time fo...

2018-06-10 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/spark/pull/21506#discussion_r194293251
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
 ---
@@ -280,38 +278,49 @@ private[state] class HDFSBackedStateStoreProvider 
extends StateStoreProvider wit
 if (loadedCurrentVersionMap.isDefined) {
   return loadedCurrentVersionMap.get
 }
-val snapshotCurrentVersionMap = readSnapshotFile(version)
-if (snapshotCurrentVersionMap.isDefined) {
-  synchronized { loadedMaps.put(version, 
snapshotCurrentVersionMap.get) }
-  return snapshotCurrentVersionMap.get
-}
 
-// Find the most recent map before this version that we can.
-// [SPARK-22305] This must be done iteratively to avoid stack overflow.
-var lastAvailableVersion = version
-var lastAvailableMap: Option[MapType] = None
-while (lastAvailableMap.isEmpty) {
-  lastAvailableVersion -= 1
+logWarning(s"The state for version $version doesn't exist in 
loadedMaps. " +
+  "Reading snapshot file and delta files if needed..." +
+  "Note that this is normal for the first batch of starting query.")
 
-  if (lastAvailableVersion <= 0) {
-// Use an empty map for versions 0 or less.
-lastAvailableMap = Some(new MapType)
-  } else {
-lastAvailableMap =
-  synchronized { loadedMaps.get(lastAvailableVersion) }
-.orElse(readSnapshotFile(lastAvailableVersion))
+val (result, elapsedMs) = Utils.timeTakenMs {
--- End diff --

Yup right. Most of the code change is just wrapping codes into timeTakenMs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21505: [SPARK-24457][SQL] Improving performance of strin...

2018-06-10 Thread ssonker
Github user ssonker commented on a diff in the pull request:

https://github.com/apache/spark/pull/21505#discussion_r194292883
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -111,6 +113,23 @@ object DateTimeUtils {
 computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone)
   }
 
+  private val threadLocalComputedCalendarsMap =
+new ThreadLocal[mutable.Map[TimeZone, Calendar]] {
--- End diff --

@kiszk I think @viirya meant having just one thread-local calendar 
instance. That should also work, isn't it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r194292645
  
--- Diff: dev/merge_spark_pr.py ---
@@ -39,6 +39,9 @@
 except ImportError:
 JIRA_IMPORTED = False
 
+if sys.version_info[0] >= 3:
+raw_input = input
--- End diff --

Does this script run with Python 3+ now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r194292662
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -49,6 +49,9 @@
 print("Install using 'sudo pip install unidecode'")
 sys.exit(-1)
 
+if sys.version_info[0] >= 3:
+raw_input = input
--- End diff --

Does this script work in Python 3+ now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r194292586
  
--- Diff: python/pyspark/sql/conf.py ---
@@ -59,7 +62,7 @@ def unset(self, key):
 
 def _checkType(self, obj, identifier):
 """Assert that an object is of type str."""
-if not isinstance(obj, str) and not isinstance(obj, unicode):
+if not isinstance(obj, basestring):
--- End diff --

This is fine since we rely on short-circuiting.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r194292548
  
--- Diff: python/pyspark/streaming/dstream.py ---
@@ -23,6 +23,8 @@
 
 if sys.version < "3":
 from itertools import imap as map, ifilter as filter
+else:
+long = int
--- End diff --

Can you add a test for it? Seems only used once and shouldn't be difficult 
to add a test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194292067
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
--- End diff --

This PR also changed `__repr__`. Thus, we need to update the PR title and 
description. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21501
  
**[Test build #91648 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91648/testReport)**
 for PR 21501 at commit 
[`b4249c3`](https://github.com/apache/spark/commit/b4249c342a92dc840a1f1d5290c24a5fe165417d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21494: [WIP][SPARK-24375][Prototype] Support barrier sch...

2018-06-10 Thread galv
Github user galv commented on a diff in the pull request:

https://github.com/apache/spark/pull/21494#discussion_r193953345
  
--- Diff: core/src/main/scala/org/apache/spark/util/RpcUtils.scala ---
@@ -44,7 +44,7 @@ private[spark] object RpcUtils {
 
   /** Returns the default Spark timeout to use for RPC ask operations. */
   def askRpcTimeout(conf: SparkConf): RpcTimeout = {
-RpcTimeout(conf, Seq("spark.rpc.askTimeout", "spark.network.timeout"), 
"120s")
+RpcTimeout(conf, Seq("spark.rpc.askTimeout", "spark.network.timeout"), 
"900s")
--- End diff --

Why hard-code this change? Couldn't you have set this at runtime if you 
needed it increased? I'm concerned about it breaking backwards compatibility 
with jobs that for whatever reason depend on the 120 second timeout.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21494: [WIP][SPARK-24375][Prototype] Support barrier sch...

2018-06-10 Thread galv
Github user galv commented on a diff in the pull request:

https://github.com/apache/spark/pull/21494#discussion_r193953432
  
--- Diff: core/src/main/scala/org/apache/spark/barrier/BarrierRDD.scala ---
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.barrier
+
+import scala.reflect.ClassTag
+
+import org.apache.spark.{Partition, TaskContext}
+import org.apache.spark.rdd.RDD
+
+
+/**
+ * An RDD that supports running MPI programme.
--- End diff --

`programme` -> `program`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194287915
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
--- End diff --

In the ongoing release, a nice-to-have refactoring is to move all the Core 
Confs into a single file just like what we did in Spark SQL Conf. Default 
values, boundary checking, types and descriptions. Thus, in PySpark, it would 
be better to do it starting from now. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21495: [SPARK-24418][Build] Upgrade Scala to 2.11.12 and...

2018-06-10 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21495#discussion_r194287473
  
--- Diff: 
repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoopInterpreter.scala
 ---
@@ -21,8 +21,22 @@ import scala.collection.mutable
 import scala.tools.nsc.Settings
 import scala.tools.nsc.interpreter._
 
-class SparkILoopInterpreter(settings: Settings, out: JPrintWriter) extends 
IMain(settings, out) {
-  self =>
+class SparkILoopInterpreter(settings: Settings, out: JPrintWriter, 
initializeSpark: () => Unit)
+extends IMain(settings, out) { self =>
--- End diff --

IIRC, four spaces is OK.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20838
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91644/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20838
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20838
  
**[Test build #91644 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91644/testReport)**
 for PR 20838 at commit 
[`fd4d922`](https://github.com/apache/spark/commit/fd4d9225a23bac79e895f5bd223001b8ccb6ba15).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21520: [SPARK-24505][SQL] Forbidding string interpolation in Co...

2018-06-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21520
  
> 1. We are seeing many inline prefix with a few typical patterns.
> Can we introduce new APIs to avoid repetations of adding inline, for 
example JavaCode.className(Class[_]): JavaCode for the first call.

@kiszk I initially took a similar approach but found soon that I'd create 
too many APIs. I'm not pretty sure if that is what we want to have distinguish 
them in API level because they are all actually a simple piece of inline string 
in code, so I turned to a `inline` to treat them as same.

> 2. We are seeing many JavaCode.global() or JavaCode.variable() when we 
create a new variable.
Would it be possible to make them simpler?

Yes, I noticed that too. I was planning to change existing API such as 
`ctx.freshName`. But I leave it as it and set the first goal to pass all tests 
after forbidding string interpolation. Since the tests are passed now, I think 
we can incrementally make the changes more simpler and clear. I've proposed to 
do this part in some smaller PRs (ref: 
https://github.com/apache/spark/pull/21520#issuecomment-396111725). WDYT?







---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21520: [SPARK-24505][SQL] Forbidding string interpolation in Co...

2018-06-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21520
  
@kiszk @mgaido91 Thanks for your comment!

> What do you think about starting doing the needed changes in smaller PRs 
which focus only on specific part and forbidding the string interpolation after 
those have made the needed changes smaller?

By disallowing string interpolation in code blocks, any strings passed into 
a code block won't pass the compilation. It is also more guaranteed that we 
don't miss any strings. It is why this change is quite big and not in many 
smaller pieces. Most important is, I need to have all the changes together to 
see if we can pass all the tests once we completely forbid string interpolation.

But it doesn't mean we need to review and merge this big change. It is 
still possible to break this into smaller PRs. It may work like this:

1. Split a part of change into a smaller PR, review it and finally merge it.
2. Incorporate the merged change back to this PR. Make test passed. Go back 
to step 1.

WDYT?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21067: [SPARK-23980][K8S] Resilient Spark driver on Kubernetes

2018-06-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21067
  
any update?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21202: [SPARK-24129] [K8S] Add option to pass --build-arg's to ...

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21202
  
**[Test build #91647 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91647/testReport)**
 for PR 21202 at commit 
[`3e410cd`](https://github.com/apache/spark/commit/3e410cdc9cf09996a3962727107125fc950d034e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21202: [SPARK-24129] [K8S] Add option to pass --build-arg's to ...

2018-06-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21202
  
@devaraj-kavali could you rebase this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21202: [SPARK-24129] [K8S] Add option to pass --build-arg's to ...

2018-06-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21202
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21486: [SPARK-24387][Core] Heartbeat-timeout executor is added ...

2018-06-10 Thread lirui-apache
Github user lirui-apache commented on the issue:

https://github.com/apache/spark/pull/21486
  
cc @vanzin @andrewor14 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19755: [SPARK-22524][SQL] Subquery shows reused on UI SQ...

2018-06-10 Thread gczsjdy
Github user gczsjdy closed the pull request at:

https://github.com/apache/spark/pull/19755


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21504: [SPARK-24479][SS] Added config for registering streaming...

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21504
  
**[Test build #91645 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91645/testReport)**
 for PR 21504 at commit 
[`421e16b`](https://github.com/apache/spark/commit/421e16b20f63f8df7f279bf2dcea76a060a85ad3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21497
  
**[Test build #91646 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91646/testReport)**
 for PR 21497 at commit 
[`d069dd0`](https://github.com/apache/spark/commit/d069dd009bac833ac5f1a61bd9f911d1e021e15c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21497
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21504: [SPARK-24479][SS] Added config for registering streaming...

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21504
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-06-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21467


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21467
  
Merged to master.

@e-dorigatti, it has some conflicts in branch-2.3 too. Mind if I ask to 
open a backporting PR again to reduce the difference between master and 
branch-2.3?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21495: [SPARK-24418][Build] Upgrade Scala to 2.11.12 and 2.12.6

2018-06-10 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21495
  
Having issues tested with latest patch:

```
Exception in thread "main" java.lang.NoSuchMethodError: 
jline.console.completer.CandidateListCompletionHandler.setPrintSpaceAfterFullCompletion(Z)V
at 
scala.tools.nsc.interpreter.jline.JLineConsoleReader.initCompletion(JLineReader.scala:139)
at 
scala.tools.nsc.interpreter.jline.InteractiveReader.postInit(JLineReader.scala:54)
at 
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1$$anonfun$25.apply(ILoop.scala:899)
at 
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1$$anonfun$25.apply(ILoop.scala:897)
at 
scala.tools.nsc.interpreter.SplashReader.postInit(InteractiveReader.scala:130)
at 
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1$$anonfun$scala$tools$nsc$interpreter$ILoop$$anonfun$$loopPostInit$1$1.apply$mcV$sp(ILoop.scala:926)
at 
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1$$anonfun$scala$tools$nsc$interpreter$ILoop$$anonfun$$loopPostInit$1$1.apply(ILoop.scala:908)
at 
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1$$anonfun$scala$tools$nsc$interpreter$ILoop$$anonfun$$loopPostInit$1$1.apply(ILoop.scala:908)
at 
scala.tools.nsc.interpreter.ILoop$$anonfun$mumly$1.apply(ILoop.scala:189)
at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:221)
at scala.tools.nsc.interpreter.ILoop.mumly(ILoop.scala:186)
at 
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1$$anonfun$startup$1$1.apply(ILoop.scala:979)
at 
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:990)
at 
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:891)
at 
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:891)
at 
scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:891)
at org.apache.spark.repl.Main$.doMain(Main.scala:76)
at org.apache.spark.repl.Main$.main(Main.scala:56)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:837)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:194)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:912)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:923)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Exception in thread "Thread-15" java.lang.InterruptedException
at java.util.concurrent.SynchronousQueue.put(SynchronousQueue.java:879)
at 
scala.tools.nsc.interpreter.SplashLoop.run(InteractiveReader.scala:77)
at java.lang.Thread.run(Thread.java:745)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...

2018-06-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21481
  
Thank you for your comment. I will create another PR for integrating 
findBugs/SpotBugs into maven.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-10 Thread edwinalu
Github user edwinalu commented on the issue:

https://github.com/apache/spark/pull/21221
  
@squito , I'm modifying ExecutorMetrics to take in the metrics array -- 
this will be easier for tests where we pass in set values, and seems fine for 
the actual code. It will check that the length of the passed in array is the 
same as MetricGetter.values.length. Let me know if you have any concerns.

@felixcheung , I'll finish the current changes, then rebase. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21427
  
It's at least not trivial as much as Scaia side's. I am okay but please 
make sure what case we will allow by this configuration.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194278100
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
--- End diff --

Probably, we should access to SQLConf object. 1. Agree with not hardcoding 
it in general but 2. IMHO I want to avoid Py4J JVM accesses in the test because 
the test can likely be more flaky up to my knowledge, on the other hand (unlike 
Scala or Java side).

Maybe we should try to take a look about this hardcoding if we see more 
occurrences next time


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194277542
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and the REPL you are using 
supports eager evaluation,
--- End diff --

Just a question. When the REPL does not support eager evaluation, could we 
do anything better instead of silently ignoring the user inputs? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21370
  
@xuanyuanking Thanks for your contributions! Test coverage is the most 
critical when we refactor the existing code and add new features. Hopefully, 
when you submit new PRs in the future, could you also improve this part? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194277082
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3209,6 +3222,19 @@ class Dataset[T] private[sql](
 }
   }
 
+  private[sql] def getRowsToPython(
--- End diff --

In DataFrameSuite, we have multiple test cases for `showString` instead of 
`getRows `, which is introduced in this PR. 

We also need the unit test cases for `getRowsToPython`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194276795
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
--- End diff --

These confs are not part of `spark.sql("SET -v").show(numRows = 200, 
truncate = false)`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194276735
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
--- End diff --

Is that possible we can avoid hard-coding these conf key values? cc @ueshin 
@HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194276557
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
--- End diff --

All the SQL configurations should follow what we did in the section of 
`Spark SQL` https://spark.apache.org/docs/latest/configuration.html#spark-sql. 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194276329
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+vertical = False
--- End diff --

Any discussion about this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194276298
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3209,6 +3222,19 @@ class Dataset[T] private[sql](
 }
   }
 
+  private[sql] def getRowsToPython(
+  _numRows: Int,
+  truncate: Int,
+  vertical: Boolean): Array[Any] = {
+EvaluatePython.registerPicklers()
+val numRows = _numRows.max(0).min(Int.MaxValue - 1)
--- End diff --

This should be also part of the conf description. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194276179
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3074,6 +3074,36 @@ def test_checking_csv_header(self):
 finally:
 shutil.rmtree(path)
 
+def test_repr_html(self):
--- End diff --

This function only covers the most basic positive case. We need also add 
more test cases. For example, the results when 
`spark.sql.repl.eagerEval.enabled` is set to `false`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21370
  
@xuanyuanking @HyukjinKwon Sorry for the delay. Super busy in the week of 
Spark summit. Will carefully review this PR today or tomorrow. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194275282
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and the REPL you are using 
supports eager evaluation,
+Dataset will be ran automatically. The HTML table which generated by 
_repl_html_
+called by notebooks like Jupyter will feedback the queries user have 
defined. For plain Python
+REPL, the output will be shown like dataframe.show()
+(see https://issues.apache.org/jira/browse/SPARK-24215;>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.maxNumRows
+  20
+  
+Default number of rows in eager evaluation output HTML table generated 
by _repr_html_ or plain text,
+this only take effect when 
spark.sql.repl.eagerEval.enabled is set to true.
--- End diff --

take -> takes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194275288
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and the REPL you are using 
supports eager evaluation,
+Dataset will be ran automatically. The HTML table which generated by 
_repl_html_
+called by notebooks like Jupyter will feedback the queries user have 
defined. For plain Python
+REPL, the output will be shown like dataframe.show()
+(see https://issues.apache.org/jira/browse/SPARK-24215;>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.maxNumRows
+  20
+  
+Default number of rows in eager evaluation output HTML table generated 
by _repr_html_ or plain text,
+this only take effect when 
spark.sql.repl.eagerEval.enabled is set to true.
+  
+
+
+  spark.sql.repl.eagerEval.truncate
+  20
+  
+Default number of truncate in eager evaluation output HTML table 
generated by _repr_html_ or
+plain text, this only take effect when 
spark.sql.repl.eagerEval.enabled set to true.
--- End diff --

take -> takes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21508: [SPARK-24488] [SQL] Fix issue when generator is a...

2018-06-10 Thread bkrieger
Github user bkrieger commented on a diff in the pull request:

https://github.com/apache/spark/pull/21508#discussion_r194274619
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1568,11 +1568,32 @@ class Analyzer(
   expr.find(_.isInstanceOf[Generator]).isDefined
 }
 
-private def hasNestedGenerator(expr: NamedExpression): Boolean = expr 
match {
-  case UnresolvedAlias(_: Generator, _) => false
-  case Alias(_: Generator, _) => false
-  case MultiAlias(_: Generator, _) => false
-  case other => hasGenerator(other)
+private def hasNestedGenerator(expr: NamedExpression): Boolean = {
+  trimNonTopLevelAliases(expr) match {
+case UnresolvedAlias(_: Generator, _) => false
+case Alias(_: Generator, _) => false
+case MultiAlias(_: Generator, _) => false
+case other => hasGenerator(other)
+  }
+}
+
+def trimNonTopLevelAliases(e: Expression): Expression = e match {
+  case a: UnresolvedAlias =>
--- End diff --

In my use case, no. But I wasn't sure if another use case would care. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21508: [SPARK-24488] [SQL] Fix issue when generator is a...

2018-06-10 Thread bkrieger
Github user bkrieger commented on a diff in the pull request:

https://github.com/apache/spark/pull/21508#discussion_r194274604
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1568,11 +1568,32 @@ class Analyzer(
   expr.find(_.isInstanceOf[Generator]).isDefined
 }
 
-private def hasNestedGenerator(expr: NamedExpression): Boolean = expr 
match {
-  case UnresolvedAlias(_: Generator, _) => false
-  case Alias(_: Generator, _) => false
-  case MultiAlias(_: Generator, _) => false
-  case other => hasGenerator(other)
+private def hasNestedGenerator(expr: NamedExpression): Boolean = {
+  trimNonTopLevelAliases(expr) match {
+case UnresolvedAlias(_: Generator, _) => false
+case Alias(_: Generator, _) => false
+case MultiAlias(_: Generator, _) => false
+case other => hasGenerator(other)
+  }
+}
+
+def trimNonTopLevelAliases(e: Expression): Expression = e match {
--- End diff --

Sure- I didn't want to break any existing functionality, but I can do that 
instead. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21452: [MINOR][CORE] Log committer class used by HadoopMapRedCo...

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21452
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21452: [MINOR][CORE] Log committer class used by HadoopMapRedCo...

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21452
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91640/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21452: [MINOR][CORE] Log committer class used by HadoopMapRedCo...

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21452
  
**[Test build #91640 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91640/testReport)**
 for PR 21452 at commit 
[`9881d9c`](https://github.com/apache/spark/commit/9881d9c6a2b1d56e69bb06ee27fd8706f6e0fe43).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `logInfo(s\"Using output committer class $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21524: [SPARK-24212][ML][doc] Add the example and user guide fo...

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21524
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21524: [SPARK-24212][ML][doc] Add the example and user guide fo...

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21524
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21524: [SPARK-24212][ML][doc] Add the example and user guide fo...

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21524
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21524: [SPARK-24212][ML][doc] Add the example and user g...

2018-06-10 Thread tengpeng
GitHub user tengpeng opened a pull request:

https://github.com/apache/spark/pull/21524

[SPARK-24212][ML][doc] Add the example and user guide for ML PrefixSpan

## What changes were proposed in this pull request?

There are no example and user guide for ML PrefixSpan (not MLlib 
PrefixSpan).

This PR adds an example and a user guide.

## How was this patch tested?

Generated the local web page. See the screenshot.

https://user-images.githubusercontent.com/2724786/41207516-3d5c137e-6cdd-11e8-8e8f-f713231cc4fd.png;>


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tengpeng/spark Spark-24212

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21524.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21524


commit 333f97e48ffab3354bf2627959c40ac7a394a979
Author: Teng Peng 
Date:   2018-06-10T23:31:56Z

Add the example and user guide for ML PrefixSpan




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21438
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21438
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91641/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21438
  
**[Test build #91641 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91641/testReport)**
 for PR 21438 at commit 
[`eb87d2d`](https://github.com/apache/spark/commit/eb87d2d595374f3325a91ac53f0c11bff2b978e7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21508: [SPARK-24488] [SQL] Fix issue when generator is a...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21508#discussion_r194273874
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1568,11 +1568,32 @@ class Analyzer(
   expr.find(_.isInstanceOf[Generator]).isDefined
 }
 
-private def hasNestedGenerator(expr: NamedExpression): Boolean = expr 
match {
-  case UnresolvedAlias(_: Generator, _) => false
-  case Alias(_: Generator, _) => false
-  case MultiAlias(_: Generator, _) => false
-  case other => hasGenerator(other)
+private def hasNestedGenerator(expr: NamedExpression): Boolean = {
+  trimNonTopLevelAliases(expr) match {
+case UnresolvedAlias(_: Generator, _) => false
+case Alias(_: Generator, _) => false
+case MultiAlias(_: Generator, _) => false
+case other => hasGenerator(other)
+  }
+}
+
+def trimNonTopLevelAliases(e: Expression): Expression = e match {
+  case a: UnresolvedAlias =>
--- End diff --

Do we need to handle `UnresolvedAlias`? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21508: [SPARK-24488] [SQL] Fix issue when generator is a...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21508#discussion_r194273780
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1568,11 +1568,32 @@ class Analyzer(
   expr.find(_.isInstanceOf[Generator]).isDefined
 }
 
-private def hasNestedGenerator(expr: NamedExpression): Boolean = expr 
match {
-  case UnresolvedAlias(_: Generator, _) => false
-  case Alias(_: Generator, _) => false
-  case MultiAlias(_: Generator, _) => false
-  case other => hasGenerator(other)
+private def hasNestedGenerator(expr: NamedExpression): Boolean = {
+  trimNonTopLevelAliases(expr) match {
+case UnresolvedAlias(_: Generator, _) => false
+case Alias(_: Generator, _) => false
+case MultiAlias(_: Generator, _) => false
+case other => hasGenerator(other)
+  }
+}
+
+def trimNonTopLevelAliases(e: Expression): Expression = e match {
--- End diff --

Instead of duplicating the function here, could we just fixing 
`CleanupAliases.trimNonTopLevelAliases`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21045: [SPARK-23931][SQL] Adds zip function to sparksql

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21045
  
**[Test build #91643 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91643/testReport)**
 for PR 21045 at commit 
[`d8f3dea`](https://github.com/apache/spark/commit/d8f3dea8b227a4ee44dedb6b8199c8a17f6bfdd4).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ArraysZip(children: Seq[Expression]) extends Expression 
with ExpectsInputTypes `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21045: [SPARK-23931][SQL] Adds zip function to sparksql

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21045
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91643/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21045: [SPARK-23931][SQL] Adds zip function to sparksql

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21045
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20838
  
**[Test build #91644 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91644/testReport)**
 for PR 20838 at commit 
[`fd4d922`](https://github.com/apache/spark/commit/fd4d9225a23bac79e895f5bd223001b8ccb6ba15).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21045: [SPARK-23931][SQL] Adds zip function to sparksql

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21045
  
**[Test build #91643 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91643/testReport)**
 for PR 21045 at commit 
[`d8f3dea`](https://github.com/apache/spark/commit/d8f3dea8b227a4ee44dedb6b8199c8a17f6bfdd4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...

2018-06-10 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/16006
  
#19431 was merged, thanks for your work.  This PR should probably be closed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...

2018-06-10 Thread JoshRosen
Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/21481
  
Let's merge this as-is and do the build improvements in a separate PR. 
That's important because we may want to backport the overflow fix to 
maintenance branches and may want to do so independent of the build changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21505: [SPARK-24457][SQL] Improving performance of strin...

2018-06-10 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21505#discussion_r194268734
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -111,6 +113,23 @@ object DateTimeUtils {
 computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone)
   }
 
+  private val threadLocalComputedCalendarsMap =
+new ThreadLocal[mutable.Map[TimeZone, Calendar]] {
--- End diff --

Usually, only the default time zone is used. To execute `Cast` regarding 
date is called with a timezone may use another timezone. For the correctness, I 
think that it is necessary to support multiple timezones.

To enable caching for default time zone and to create an instance for other 
time zones would also work correctly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-10 Thread IgorBerman
Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@felixcheung sorry I missed something? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #91642 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91642/testReport)**
 for PR 21221 at commit 
[`7879e66`](https://github.com/apache/spark/commit/7879e66eed22cfd4dff2367c0ee3138369243711).
 * This patch **fails to build**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `sealed trait MetricGetter `
  * `abstract class MemoryManagerMetricGetter(f: MemoryManager => Long) 
extends MetricGetter `
  * `abstract class MBeanMetricGetter(mBeanName: String) extends 
MetricGetter `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91642/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #91642 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91642/testReport)**
 for PR 21221 at commit 
[`7879e66`](https://github.com/apache/spark/commit/7879e66eed22cfd4dff2367c0ee3138369243711).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21438
  
**[Test build #91641 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91641/testReport)**
 for PR 21438 at commit 
[`eb87d2d`](https://github.com/apache/spark/commit/eb87d2d595374f3325a91ac53f0c11bff2b978e7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #91639 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91639/testReport)**
 for PR 20640 at commit 
[`a7ff8cc`](https://github.com/apache/spark/commit/a7ff8cccd1b7e5564880c40c503c169c6bed46b9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21452: [MINOR][CORE] Log committer class used by HadoopMapRedCo...

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21452
  
**[Test build #91640 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91640/testReport)**
 for PR 21452 at commit 
[`9881d9c`](https://github.com/apache/spark/commit/9881d9c6a2b1d56e69bb06ee27fd8706f6e0fe43).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91639/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21221
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21221
  
probably need to be rebased


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21452: [MINOR][CORE] Log committer class used by HadoopMapRedCo...

2018-06-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21452
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...

2018-06-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21438
  
I think filtering off `metricIds` still make sense right? @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener.aggrega...

2018-06-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21438
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-06-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20838
  
any update?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

2018-06-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20272
  
Is this aligned with the "in cluster client"? @foxish @mccheah 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...

2018-06-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21481
  
Since I found an plug-in for maven, I will also include a patch to add 
findBugs/SpotBugs into maven in this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #91639 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91639/testReport)**
 for PR 20640 at commit 
[`a7ff8cc`](https://github.com/apache/spark/commit/a7ff8cccd1b7e5564880c40c503c169c6bed46b9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >