date:20171020

[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19272
  
**[Test build #82941 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82941/testReport)**
 for PR 19272 at commit 
[`c95f80b`](https://github.com/apache/spark/commit/c95f80b23d47ea4640cea2b4a185fa4bf9e9f33d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...

2017-10-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18664#discussion_r146096087
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1619,11 +1619,39 @@ def to_arrow_type(dt):
 arrow_type = pa.decimal(dt.precision, dt.scale)
 elif type(dt) == StringType:
 arrow_type = pa.string()
+elif type(dt) == DateType:
+arrow_type = pa.date32()
+elif type(dt) == TimestampType:
+# Timestamps should be in UTC, JVM Arrow timestamps require a 
timezone to be read
+arrow_type = pa.timestamp('us', tz='UTC')
 else:
 raise TypeError("Unsupported type in conversion to Arrow: " + 
str(dt))
 return arrow_type
 
 
+def _check_dataframe_localize_timestamps(df):
+""" Convert timezone aware timestamps to timezone-naive in local time
+"""
+from pandas.api.types import is_datetime64tz_dtype
+for column, series in df.iteritems():
+# TODO: handle nested timestamps?
+if is_datetime64tz_dtype(series.dtype):
+df[column] = 
series.dt.tz_convert('tzlocal()').dt.tz_localize(None)
+return df
+
+
+def _check_series_convert_timestamps_internal(s):
+""" Convert a tz-naive timestamp in local tz to UTC normalized for 
Spark internal storage
+"""
+from pandas.api.types import is_datetime64_dtype
+# TODO: handle nested timestamps?
--- End diff --

If it is unsupported, could you also add a negative test case if not 
existed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD C...

2017-10-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19545
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_nam...

2017-10-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19545#discussion_r146096042
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -2202,56 +2202,64 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  def testAddColumn(provider: String): Unit = {
--- End diff --

Nit: `protected`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_nam...

2017-10-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19545#discussion_r146096038
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -2202,56 +2202,64 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  def testAddColumn(provider: String): Unit = {
+withTable("t1") {
+  sql(s"CREATE TABLE t1 (c1 int) USING $provider")
+  sql("INSERT INTO t1 VALUES (1)")
+  sql("ALTER TABLE t1 ADD COLUMNS (c2 int)")
+  checkAnswer(
+spark.table("t1"),
+Seq(Row(1, null))
+  )
+  checkAnswer(
+sql("SELECT * FROM t1 WHERE c2 is null"),
+Seq(Row(1, null))
+  )
+
+  sql("INSERT INTO t1 VALUES (3, 2)")
+  checkAnswer(
+sql("SELECT * FROM t1 WHERE c2 = 2"),
+Seq(Row(3, 2))
+  )
+}
+  }
+
+  def testAddColumnPartitioned(provider: String): Unit = {
--- End diff --

Nit: `protected`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19539: [SPARK-22326] [SQL] Remove unnecessary hashCode a...

2017-10-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19539


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19539: [SPARK-22326] [SQL] Remove unnecessary hashCode and equa...

2017-10-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19539
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19539: [SPARK-22326] [SQL] Remove unnecessary hashCode and equa...

2017-10-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19539
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to p...

2017-10-20 Thread ambauma

Github user ambauma commented on a diff in the pull request:

https://github.com/apache/spark/pull/19528#discussion_r146095022
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/ui/DriverPage.scala
 ---
@@ -0,0 +1,180 @@
+/*
--- End diff --

I'm not sure what I did to make this whole file look new, but I've copied 
the 1.6 current and reapplied stripXSS locally.  Waiting for my build to pass 
to commit again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19514: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-10-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19514
  
Thanks, I have been following it @shivaram and @felixcheung. Separate JIRA 
sounds good to me and I am okay witn merging it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19514: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-10-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19514
  
LGTM too BTW.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...

2017-10-20 Thread ArtRand

Github user ArtRand commented on a diff in the pull request:

https://github.com/apache/spark/pull/19272#discussion_r146094019
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -194,6 +198,27 @@ private[spark] class 
MesosCoarseGrainedSchedulerBackend(
   sc.conf.getOption("spark.mesos.driver.frameworkId").map(_ + suffix)
 )
 
+// check that the credentials are defined, even though it's likely 
that auth would have failed
+// already if you've made it this far
+if (principal != null && hadoopDelegationCreds.isDefined) {
+  logDebug(s"Principal found ($principal) starting token renewer")
+  val credentialRenewerThread = new Thread {
+setName("MesosCredentialRenewer")
+override def run(): Unit = {
+  val rt = 
MesosCredentialRenewer.getTokenRenewalTime(hadoopDelegationCreds.get, conf)
+  val credentialRenewer =
+new MesosCredentialRenewer(
+  conf,
+  hadoopDelegationTokenManager.get,
+  MesosCredentialRenewer.getNextRenewalTime(rt),
+  driverEndpoint)
+  credentialRenewer.scheduleTokenRenewal()
+}
+  }
+
+  credentialRenewerThread.start()
+  credentialRenewerThread.join()
--- End diff --

Ok, you're probably right. It appears that the YARN code uses 
`setContextClassLoader(userClassLoader)` whereas in Mesos does not has a notion 
of `userClassLoader`. Therefore we don't need the separate thread in the Mesos 
code. Do I have this correct? Thanks for showing me this!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19539: [SPARK-22326] [SQL] Remove unnecessary hashCode and equa...

2017-10-20 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/19539
  
@gatorsmile JIRA created.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19468
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82940/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19468
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19468
  
**[Test build #82940 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82940/testReport)**
 for PR 19468 at commit 
[`c565c9f`](https://github.com/apache/spark/commit/c565c9ffd7e5371ee4425d69ecaf49ce92199fc7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19468
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19468
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82938/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19468
  
**[Test build #82938 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82938/testReport)**
 for PR 19468 at commit 
[`c052212`](https://github.com/apache/spark/commit/c052212888e01eac90a006bfb5d14c513e33d0a3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD C...

2017-10-20 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19545
  
Hi, @gatorsmile . Could you review this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_nam...

2017-10-20 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19545#discussion_r146090102
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -235,11 +235,10 @@ case class AlterTableAddColumnsCommand(
   DataSource.lookupDataSource(catalogTable.provider.get).newInstance() 
match {
 // For datasource table, this command can only support the 
following File format.
 // TextFileFormat only default to one column "value"
-// OrcFileFormat can not handle difference between user-specified 
schema and
-// inferred schema yet. TODO, once this issue is resolved , we can 
add Orc back.
 // Hive type is already considered as hive serde table, so the 
logic will not
 // come in here.
 case _: JsonFileFormat | _: CSVFileFormat | _: ParquetFileFormat =>
+case s if s.getClass.getCanonicalName.endsWith("OrcFileFormat") =>
--- End diff --

After implementing OrcFileFormat based on Apache ORC, we can move 
`OrcFileFormat` from `sql/hive` module into `sql/core` module.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD C...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19545
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82939/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD C...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19545
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD C...

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19545
  
**[Test build #82939 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82939/testReport)**
 for PR 19545 at commit 
[`cc52547`](https://github.com/apache/spark/commit/cc525479951868ff7094097aea886819c29fb549).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...

2017-10-20 Thread sitalkedia

Github user sitalkedia commented on the issue:

https://github.com/apache/spark/pull/19534
  
I think other PR is fixing one more issue on top of runningTasks being 
negative, so we can proceed with the other one.  What do you think @jerryshao ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to p...

2017-10-20 Thread ambauma

Github user ambauma commented on a diff in the pull request:

https://github.com/apache/spark/pull/19528#discussion_r146084730
  
--- Diff: python/pyspark/mllib/classification.py ---
@@ -173,7 +173,7 @@ def __init__(self, weights, intercept, numFeatures, 
numClasses):
 self._dataWithBiasSize = None
 self._weightsMatrix = None
 else:
-self._dataWithBiasSize = self._coeff.size / (self._numClasses 
- 1)
+self._dataWithBiasSize = self._coeff.size // (self._numClasses 
- 1)
--- End diff --

The NewSparkPullRequestBuilder failed on python tests.  I was only able to 
duplicate the failure with Python 3.4 and numpy 1.12.1, which I'm guessing is 
the versions that NewSparkPullRequestBuilder is using.  Older and newer 
versions of numpy build clean either way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to p...

2017-10-20 Thread ambauma

Github user ambauma commented on a diff in the pull request:

https://github.com/apache/spark/pull/19528#discussion_r146084021
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala ---
@@ -16,9 +16,9 @@
  */
 
 package org.apache.spark.ui.jobs
-
+import javax.servlet.http.HttpServletRequest
--- End diff --

Agreed, will remove.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to p...

2017-10-20 Thread ambauma

Github user ambauma commented on a diff in the pull request:

https://github.com/apache/spark/pull/19528#discussion_r146080377
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/ui/DriverPage.scala
 ---
@@ -0,0 +1,180 @@
+/*
--- End diff --

I'll look into this as well...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to p...

2017-10-20 Thread ambauma

Github user ambauma commented on a diff in the pull request:

https://github.com/apache/spark/pull/19528#discussion_r146080311
  
--- Diff: python/pyspark/mllib/classification.py ---
@@ -173,7 +173,7 @@ def __init__(self, weights, intercept, numFeatures, 
numClasses):
 self._dataWithBiasSize = None
 self._weightsMatrix = None
 else:
-self._dataWithBiasSize = self._coeff.size / (self._numClasses 
- 1)
+self._dataWithBiasSize = self._coeff.size // (self._numClasses 
- 1)
--- End diff --

This is already fixed in the 2.0 branch, btw.  Just was never applied to 
1.6.  [SPARK-20862]


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to p...

2017-10-20 Thread ambauma

Github user ambauma commented on a diff in the pull request:

https://github.com/apache/spark/pull/19528#discussion_r146080177
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala ---
@@ -16,9 +16,9 @@
  */
 
 package org.apache.spark.ui.jobs
-
+import javax.servlet.http.HttpServletRequest
--- End diff --

Will look into this...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to p...

2017-10-20 Thread ambauma

Github user ambauma commented on a diff in the pull request:

https://github.com/apache/spark/pull/19528#discussion_r146080089
  
--- Diff: python/pyspark/mllib/classification.py ---
@@ -173,7 +173,7 @@ def __init__(self, weights, intercept, numFeatures, 
numClasses):
 self._dataWithBiasSize = None
 self._weightsMatrix = None
 else:
-self._dataWithBiasSize = self._coeff.size / (self._numClasses 
- 1)
+self._dataWithBiasSize = self._coeff.size // (self._numClasses 
- 1)
--- End diff --

I had to apply this to get past a python unit test failure.  My assumption 
is that the NewSparkPullRequestBuilder is on a different version of numpy than 
when the Spark 1.6 branch was last built.  The current python unit test failure 
looks like it has to do with a novel version of SciPy.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19538: [SPARK-20393][WEBU UI][BACKPORT-2.0] Strengthen Spark to...

2017-10-20 Thread ambauma

Github user ambauma commented on the issue:

https://github.com/apache/spark/pull/19538
  
I'm not looking for an official release.  My goal is to get the fix into 
the official branch 1.6 to reduce the number of forks necessary and so that if 
CVE-2018- comes and I've moved on my replacement doesn't have to apply this 
plus that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...

2017-10-20 Thread pmackles

Github user pmackles commented on the issue:

https://github.com/apache/spark/pull/19543
  
@felixcheung - fixed scala-style issues and also updated the docs to 
include the new property


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19514: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-10-20 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19514
  
I think we can safely merge this change as it clearly passes any tests 
whose functionality it would affect. We could defer further discussion about 
what to do about CRAN versions elsewhere, yes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-10-20 Thread foxish

Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r146074603
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -0,0 +1,456 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.io.Closeable
+import java.net.InetAddress
+import java.util.concurrent.{ConcurrentHashMap, ExecutorService, 
ScheduledExecutorService, TimeUnit}
+import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, 
AtomicReference}
+
+import scala.collection.{concurrent, mutable}
+import scala.collection.JavaConverters._
+import scala.concurrent.{ExecutionContext, Future}
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.{KubernetesClient, 
KubernetesClientException, Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+
+import org.apache.spark.SparkException
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv}
+import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, 
TaskSchedulerImpl}
+import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend
+import org.apache.spark.util.Utils
+
+private[spark] class KubernetesClusterSchedulerBackend(
+scheduler: TaskSchedulerImpl,
+rpcEnv: RpcEnv,
+executorPodFactory: ExecutorPodFactory,
+kubernetesClient: KubernetesClient,
+allocatorExecutor: ScheduledExecutorService,
+requestExecutorsService: ExecutorService)
+  extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) {
+
+  import KubernetesClusterSchedulerBackend._
+
+  private val EXECUTOR_ID_COUNTER = new AtomicLong(0L)
+  private val RUNNING_EXECUTOR_PODS_LOCK = new Object
+  // Indexed by executor IDs and guarded by RUNNING_EXECUTOR_PODS_LOCK.
+  private val runningExecutorsToPods = new mutable.HashMap[String, Pod]
+  // Indexed by executor pod names and guarded by 
RUNNING_EXECUTOR_PODS_LOCK.
+  private val runningPodsToExecutors = new mutable.HashMap[String, String]
+  private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]()
+  private val podsWithKnownExitReasons = new ConcurrentHashMap[String, 
ExecutorExited]()
+  private val disconnectedPodsByExecutorIdPendingRemoval = new 
ConcurrentHashMap[String, Pod]()
+
+  private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE)
+
+  private val kubernetesDriverPodName = conf
+.get(KUBERNETES_DRIVER_POD_NAME)
+.getOrElse(
+  throw new SparkException("Must specify the driver pod name"))
+  private implicit val requestExecutorContext = 
ExecutionContext.fromExecutorService(
+  requestExecutorsService)
+
+  private val driverPod = try {
+kubernetesClient.pods()
+  .inNamespace(kubernetesNamespace)
+  .withName(kubernetesDriverPodName)
+  .get()
+  } catch {
+case throwable: Throwable =>
+  logError(s"Executor cannot find driver pod.", throwable)
+  throw new SparkException(s"Executor cannot find driver pod", 
throwable)
+  }
+
+  override val minRegisteredRatio =
+if 
(conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) {
+  0.8
+} else {
+  super.minRegisteredRatio
+}
+
+  private val executorWatchResource = new AtomicReference[Closeable]
+  protected var totalExpectedExecutors = new AtomicInteger(0)
+
+  private val driverUrl = RpcEndpointAddress(
+  conf.get("spark.driver.host"),
+  conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT),
+  CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
+
+  private val initialExecutors = getInitialTargetExecutorNumber()
+
+  private val podAllocationInterval =

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-10-20 Thread foxish

Github user foxish commented on the issue:

https://github.com/apache/spark/pull/19468
  
@vanzin, you were right, the YARN constants were left overs and made no 
sense wrt k8s. We discussed it in our weekly meeting - it was simply dead code. 
I've addressed most of the style comments and the major concern about the 
constants. It's ready for a more in-depth review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19468
  
**[Test build #82940 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82940/testReport)**
 for PR 19468 at commit 
[`c565c9f`](https://github.com/apache/spark/commit/c565c9ffd7e5371ee4425d69ecaf49ce92199fc7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-10-20 Thread markhamstra

Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r146072523
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -0,0 +1,456 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.io.Closeable
+import java.net.InetAddress
+import java.util.concurrent.{ConcurrentHashMap, ExecutorService, 
ScheduledExecutorService, TimeUnit}
+import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, 
AtomicReference}
+
+import scala.collection.{concurrent, mutable}
+import scala.collection.JavaConverters._
+import scala.concurrent.{ExecutionContext, Future}
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.{KubernetesClient, 
KubernetesClientException, Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+
+import org.apache.spark.SparkException
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv}
+import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, 
TaskSchedulerImpl}
+import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend
+import org.apache.spark.util.Utils
+
+private[spark] class KubernetesClusterSchedulerBackend(
+scheduler: TaskSchedulerImpl,
+rpcEnv: RpcEnv,
+executorPodFactory: ExecutorPodFactory,
+kubernetesClient: KubernetesClient,
+allocatorExecutor: ScheduledExecutorService,
+requestExecutorsService: ExecutorService)
+  extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) {
+
+  import KubernetesClusterSchedulerBackend._
+
+  private val EXECUTOR_ID_COUNTER = new AtomicLong(0L)
+  private val RUNNING_EXECUTOR_PODS_LOCK = new Object
+  // Indexed by executor IDs and guarded by RUNNING_EXECUTOR_PODS_LOCK.
+  private val runningExecutorsToPods = new mutable.HashMap[String, Pod]
+  // Indexed by executor pod names and guarded by 
RUNNING_EXECUTOR_PODS_LOCK.
+  private val runningPodsToExecutors = new mutable.HashMap[String, String]
+  private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]()
+  private val podsWithKnownExitReasons = new ConcurrentHashMap[String, 
ExecutorExited]()
+  private val disconnectedPodsByExecutorIdPendingRemoval = new 
ConcurrentHashMap[String, Pod]()
+
+  private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE)
+
+  private val kubernetesDriverPodName = conf
+.get(KUBERNETES_DRIVER_POD_NAME)
+.getOrElse(
+  throw new SparkException("Must specify the driver pod name"))
+  private implicit val requestExecutorContext = 
ExecutionContext.fromExecutorService(
+  requestExecutorsService)
+
+  private val driverPod = try {
+kubernetesClient.pods()
+  .inNamespace(kubernetesNamespace)
+  .withName(kubernetesDriverPodName)
+  .get()
+  } catch {
+case throwable: Throwable =>
+  logError(s"Executor cannot find driver pod.", throwable)
+  throw new SparkException(s"Executor cannot find driver pod", 
throwable)
+  }
+
+  override val minRegisteredRatio =
+if 
(conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) {
+  0.8
+} else {
+  super.minRegisteredRatio
+}
+
+  private val executorWatchResource = new AtomicReference[Closeable]
+  protected var totalExpectedExecutors = new AtomicInteger(0)
+
+  private val driverUrl = RpcEndpointAddress(
+  conf.get("spark.driver.host"),
+  conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT),
+  CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
+
+  private val initialExecutors = getInitialTargetExecutorNumber()
+
+  private val podAllocationInterval =

[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-10-20 Thread foxish

Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r146072553
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -0,0 +1,456 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.io.Closeable
+import java.net.InetAddress
+import java.util.concurrent.{ConcurrentHashMap, ExecutorService, 
ScheduledExecutorService, TimeUnit}
+import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, 
AtomicReference}
+
+import scala.collection.{concurrent, mutable}
+import scala.collection.JavaConverters._
+import scala.concurrent.{ExecutionContext, Future}
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.{KubernetesClient, 
KubernetesClientException, Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+
+import org.apache.spark.SparkException
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv}
+import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, 
TaskSchedulerImpl}
+import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend
+import org.apache.spark.util.Utils
+
+private[spark] class KubernetesClusterSchedulerBackend(
+scheduler: TaskSchedulerImpl,
+rpcEnv: RpcEnv,
+executorPodFactory: ExecutorPodFactory,
+kubernetesClient: KubernetesClient,
+allocatorExecutor: ScheduledExecutorService,
+requestExecutorsService: ExecutorService)
+  extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) {
+
+  import KubernetesClusterSchedulerBackend._
+
+  private val EXECUTOR_ID_COUNTER = new AtomicLong(0L)
+  private val RUNNING_EXECUTOR_PODS_LOCK = new Object
+  // Indexed by executor IDs and guarded by RUNNING_EXECUTOR_PODS_LOCK.
+  private val runningExecutorsToPods = new mutable.HashMap[String, Pod]
+  // Indexed by executor pod names and guarded by 
RUNNING_EXECUTOR_PODS_LOCK.
+  private val runningPodsToExecutors = new mutable.HashMap[String, String]
+  private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]()
+  private val podsWithKnownExitReasons = new ConcurrentHashMap[String, 
ExecutorExited]()
+  private val disconnectedPodsByExecutorIdPendingRemoval = new 
ConcurrentHashMap[String, Pod]()
+
+  private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE)
+
+  private val kubernetesDriverPodName = conf
+.get(KUBERNETES_DRIVER_POD_NAME)
+.getOrElse(
+  throw new SparkException("Must specify the driver pod name"))
+  private implicit val requestExecutorContext = 
ExecutionContext.fromExecutorService(
+  requestExecutorsService)
+
+  private val driverPod = try {
+kubernetesClient.pods()
+  .inNamespace(kubernetesNamespace)
+  .withName(kubernetesDriverPodName)
+  .get()
+  } catch {
+case throwable: Throwable =>
--- End diff --

@ash211, PTAL


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-10-20 Thread foxish

Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r146072501
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -0,0 +1,456 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.io.Closeable
+import java.net.InetAddress
+import java.util.concurrent.{ConcurrentHashMap, ExecutorService, 
ScheduledExecutorService, TimeUnit}
+import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, 
AtomicReference}
+
+import scala.collection.{concurrent, mutable}
+import scala.collection.JavaConverters._
+import scala.concurrent.{ExecutionContext, Future}
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.{KubernetesClient, 
KubernetesClientException, Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+
+import org.apache.spark.SparkException
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv}
+import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, 
TaskSchedulerImpl}
+import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend
+import org.apache.spark.util.Utils
+
+private[spark] class KubernetesClusterSchedulerBackend(
+scheduler: TaskSchedulerImpl,
+rpcEnv: RpcEnv,
+executorPodFactory: ExecutorPodFactory,
+kubernetesClient: KubernetesClient,
+allocatorExecutor: ScheduledExecutorService,
+requestExecutorsService: ExecutorService)
+  extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) {
+
+  import KubernetesClusterSchedulerBackend._
+
+  private val EXECUTOR_ID_COUNTER = new AtomicLong(0L)
+  private val RUNNING_EXECUTOR_PODS_LOCK = new Object
+  // Indexed by executor IDs and guarded by RUNNING_EXECUTOR_PODS_LOCK.
+  private val runningExecutorsToPods = new mutable.HashMap[String, Pod]
+  // Indexed by executor pod names and guarded by 
RUNNING_EXECUTOR_PODS_LOCK.
+  private val runningPodsToExecutors = new mutable.HashMap[String, String]
+  private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]()
+  private val podsWithKnownExitReasons = new ConcurrentHashMap[String, 
ExecutorExited]()
+  private val disconnectedPodsByExecutorIdPendingRemoval = new 
ConcurrentHashMap[String, Pod]()
+
+  private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE)
+
+  private val kubernetesDriverPodName = conf
+.get(KUBERNETES_DRIVER_POD_NAME)
+.getOrElse(
+  throw new SparkException("Must specify the driver pod name"))
+  private implicit val requestExecutorContext = 
ExecutionContext.fromExecutorService(
+  requestExecutorsService)
+
+  private val driverPod = try {
+kubernetesClient.pods()
+  .inNamespace(kubernetesNamespace)
+  .withName(kubernetesDriverPodName)
+  .get()
+  } catch {
+case throwable: Throwable =>
+  logError(s"Executor cannot find driver pod.", throwable)
+  throw new SparkException(s"Executor cannot find driver pod", 
throwable)
+  }
+
+  override val minRegisteredRatio =
+if 
(conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) {
+  0.8
+} else {
+  super.minRegisteredRatio
+}
+
+  private val executorWatchResource = new AtomicReference[Closeable]
+  protected var totalExpectedExecutors = new AtomicInteger(0)
+
+  private val driverUrl = RpcEndpointAddress(
+  conf.get("spark.driver.host"),
+  conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT),
+  CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
+
+  private val initialExecutors = getInitialTargetExecutorNumber()
+
+  private val podAllocationInterval =

[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-10-20 Thread foxish

Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r146072126
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -0,0 +1,456 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.io.Closeable
+import java.net.InetAddress
+import java.util.concurrent.{ConcurrentHashMap, ExecutorService, 
ScheduledExecutorService, TimeUnit}
+import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, 
AtomicReference}
+
+import scala.collection.{concurrent, mutable}
+import scala.collection.JavaConverters._
+import scala.concurrent.{ExecutionContext, Future}
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.{KubernetesClient, 
KubernetesClientException, Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+
+import org.apache.spark.SparkException
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv}
+import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, 
TaskSchedulerImpl}
+import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend
+import org.apache.spark.util.Utils
+
+private[spark] class KubernetesClusterSchedulerBackend(
+scheduler: TaskSchedulerImpl,
+rpcEnv: RpcEnv,
+executorPodFactory: ExecutorPodFactory,
+kubernetesClient: KubernetesClient,
+allocatorExecutor: ScheduledExecutorService,
+requestExecutorsService: ExecutorService)
+  extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) {
+
+  import KubernetesClusterSchedulerBackend._
+
+  private val EXECUTOR_ID_COUNTER = new AtomicLong(0L)
+  private val RUNNING_EXECUTOR_PODS_LOCK = new Object
+  // Indexed by executor IDs and guarded by RUNNING_EXECUTOR_PODS_LOCK.
+  private val runningExecutorsToPods = new mutable.HashMap[String, Pod]
+  // Indexed by executor pod names and guarded by 
RUNNING_EXECUTOR_PODS_LOCK.
+  private val runningPodsToExecutors = new mutable.HashMap[String, String]
+  private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]()
+  private val podsWithKnownExitReasons = new ConcurrentHashMap[String, 
ExecutorExited]()
+  private val disconnectedPodsByExecutorIdPendingRemoval = new 
ConcurrentHashMap[String, Pod]()
+
+  private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE)
+
+  private val kubernetesDriverPodName = conf
+.get(KUBERNETES_DRIVER_POD_NAME)
+.getOrElse(
+  throw new SparkException("Must specify the driver pod name"))
+  private implicit val requestExecutorContext = 
ExecutionContext.fromExecutorService(
+  requestExecutorsService)
+
+  private val driverPod = try {
+kubernetesClient.pods()
+  .inNamespace(kubernetesNamespace)
+  .withName(kubernetesDriverPodName)
+  .get()
+  } catch {
+case throwable: Throwable =>
+  logError(s"Executor cannot find driver pod.", throwable)
+  throw new SparkException(s"Executor cannot find driver pod", 
throwable)
+  }
+
+  override val minRegisteredRatio =
+if 
(conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) {
+  0.8
+} else {
+  super.minRegisteredRatio
+}
+
+  private val executorWatchResource = new AtomicReference[Closeable]
+  protected var totalExpectedExecutors = new AtomicInteger(0)
+
+  private val driverUrl = RpcEndpointAddress(
+  conf.get("spark.driver.host"),
+  conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT),
+  CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
+
+  private val initialExecutors = getInitialTargetExecutorNumber()
+
+  private val podAllocationInterval =

[GitHub] spark issue #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD C...

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19545
  
**[Test build #82939 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82939/testReport)**
 for PR 19545 at commit 
[`cc52547`](https://github.com/apache/spark/commit/cc525479951868ff7094097aea886819c29fb549).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19468
  
**[Test build #82938 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82938/testReport)**
 for PR 19468 at commit 
[`c052212`](https://github.com/apache/spark/commit/c052212888e01eac90a006bfb5d14c513e33d0a3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_nam...

2017-10-20 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/19545

[SPARK-21929][SQL] Support `ALTER TABLE table_name ADD COLUMNS(..)` for ORC 
data source

## What changes were proposed in this pull request?

When SPARK-19261 implements `ALTER TABLE ADD COLUMNS`, ORC data source is 
omitted due to SPARK-14387, SPARK-16628, and SPARK-18355. Now, those issues are 
fixed and Spark 2.3 is using Spark schema to read ORC table instead of ORC file 
schema. This PR enables `ALTER TABLE ADD COLUMNS` for ORC data source.

## How was this patch tested?

Pass the updated and added test cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-21929

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19545.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19545






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...

2017-10-20 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18664#discussion_r146066999
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1619,11 +1619,39 @@ def to_arrow_type(dt):
 arrow_type = pa.decimal(dt.precision, dt.scale)
 elif type(dt) == StringType:
 arrow_type = pa.string()
+elif type(dt) == DateType:
+arrow_type = pa.date32()
+elif type(dt) == TimestampType:
+# Timestamps should be in UTC, JVM Arrow timestamps require a 
timezone to be read
+arrow_type = pa.timestamp('us', tz='UTC')
 else:
 raise TypeError("Unsupported type in conversion to Arrow: " + 
str(dt))
 return arrow_type
 
 
+def _check_dataframe_localize_timestamps(df):
+""" Convert timezone aware timestamps to timezone-naive in local time
+"""
+from pandas.api.types import is_datetime64tz_dtype
+for column, series in df.iteritems():
+# TODO: handle nested timestamps?
+if is_datetime64tz_dtype(series.dtype):
+df[column] = 
series.dt.tz_convert('tzlocal()').dt.tz_localize(None)
+return df
+
+
+def _check_series_convert_timestamps_internal(s):
+""" Convert a tz-naive timestamp in local tz to UTC normalized for 
Spark internal storage
+"""
+from pandas.api.types import is_datetime64_dtype
+# TODO: handle nested timestamps?
--- End diff --

Sorry @wesm, I meant on the Spark python side. If a pyspark ArrayType is 
used a TypeError is raised indicating it is an unsupported type.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...

2017-10-20 Thread wesm

Github user wesm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18664#discussion_r146062157
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1619,11 +1619,39 @@ def to_arrow_type(dt):
 arrow_type = pa.decimal(dt.precision, dt.scale)
 elif type(dt) == StringType:
 arrow_type = pa.string()
+elif type(dt) == DateType:
+arrow_type = pa.date32()
+elif type(dt) == TimestampType:
+# Timestamps should be in UTC, JVM Arrow timestamps require a 
timezone to be read
+arrow_type = pa.timestamp('us', tz='UTC')
 else:
 raise TypeError("Unsupported type in conversion to Arrow: " + 
str(dt))
 return arrow_type
 
 
+def _check_dataframe_localize_timestamps(df):
+""" Convert timezone aware timestamps to timezone-naive in local time
+"""
+from pandas.api.types import is_datetime64tz_dtype
+for column, series in df.iteritems():
+# TODO: handle nested timestamps?
+if is_datetime64tz_dtype(series.dtype):
+df[column] = 
series.dt.tz_convert('tzlocal()').dt.tz_localize(None)
+return df
+
+
+def _check_series_convert_timestamps_internal(s):
+""" Convert a tz-naive timestamp in local tz to UTC normalized for 
Spark internal storage
+"""
+from pandas.api.types import is_datetime64_dtype
+# TODO: handle nested timestamps?
--- End diff --

Arrays are supported in pyarrow (but perhaps not for timestamps? If that's 
true could you open a JIRA?), or do you mean something else?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...

2017-10-20 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18664#discussion_r146058020
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1619,11 +1619,39 @@ def to_arrow_type(dt):
 arrow_type = pa.decimal(dt.precision, dt.scale)
 elif type(dt) == StringType:
 arrow_type = pa.string()
+elif type(dt) == DateType:
+arrow_type = pa.date32()
+elif type(dt) == TimestampType:
+# Timestamps should be in UTC, JVM Arrow timestamps require a 
timezone to be read
+arrow_type = pa.timestamp('us', tz='UTC')
 else:
 raise TypeError("Unsupported type in conversion to Arrow: " + 
str(dt))
 return arrow_type
 
 
+def _check_dataframe_localize_timestamps(df):
+""" Convert timezone aware timestamps to timezone-naive in local time
+"""
+from pandas.api.types import is_datetime64tz_dtype
+for column, series in df.iteritems():
+# TODO: handle nested timestamps?
+if is_datetime64tz_dtype(series.dtype):
+df[column] = 
series.dt.tz_convert('tzlocal()').dt.tz_localize(None)
+return df
+
+
+def _check_series_convert_timestamps_internal(s):
+""" Convert a tz-naive timestamp in local tz to UTC normalized for 
Spark internal storage
+"""
+from pandas.api.types import is_datetime64_dtype
+# TODO: handle nested timestamps?
--- End diff --

I don't believe arrays are supported yet on the python side of things, I 
plan to look at that next.  Right now it will raise a TypeError


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...

2017-10-20 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19272#discussion_r146052502
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -194,6 +198,27 @@ private[spark] class 
MesosCoarseGrainedSchedulerBackend(
   sc.conf.getOption("spark.mesos.driver.frameworkId").map(_ + suffix)
 )
 
+// check that the credentials are defined, even though it's likely 
that auth would have failed
+// already if you've made it this far
+if (principal != null && hadoopDelegationCreds.isDefined) {
+  logDebug(s"Principal found ($principal) starting token renewer")
+  val credentialRenewerThread = new Thread {
+setName("MesosCredentialRenewer")
+override def run(): Unit = {
+  val rt = 
MesosCredentialRenewer.getTokenRenewalTime(hadoopDelegationCreds.get, conf)
+  val credentialRenewer =
+new MesosCredentialRenewer(
+  conf,
+  hadoopDelegationTokenManager.get,
+  MesosCredentialRenewer.getNextRenewalTime(rt),
+  driverEndpoint)
+  credentialRenewer.scheduleTokenRenewal()
+}
+  }
+
+  credentialRenewerThread.start()
+  credentialRenewerThread.join()
--- End diff --

I don't think you really understood why the YARN code needs a thread and 
why I'm telling you this code does not. Read the comment you added here again; 
what makes you think the current thread does not have access to those classes?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19517


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply()...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19517
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply()...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19517
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82936/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply()...

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19517
  
**[Test build #82936 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82936/testReport)**
 for PR 19517 at commit 
[`59d61a4`](https://github.com/apache/spark/commit/59d61a46a15b00f8af9ec8e2c6930853b7097b1c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19541: ABCD

2017-10-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19541


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19542: Branch 1.1.

2017-10-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19542


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...

2017-10-20 Thread wesm

Github user wesm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18664#discussion_r146044463
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1619,11 +1619,39 @@ def to_arrow_type(dt):
 arrow_type = pa.decimal(dt.precision, dt.scale)
 elif type(dt) == StringType:
 arrow_type = pa.string()
+elif type(dt) == DateType:
+arrow_type = pa.date32()
+elif type(dt) == TimestampType:
+# Timestamps should be in UTC, JVM Arrow timestamps require a 
timezone to be read
+arrow_type = pa.timestamp('us', tz='UTC')
 else:
 raise TypeError("Unsupported type in conversion to Arrow: " + 
str(dt))
 return arrow_type
 
 
+def _check_dataframe_localize_timestamps(df):
+""" Convert timezone aware timestamps to timezone-naive in local time
+"""
+from pandas.api.types import is_datetime64tz_dtype
+for column, series in df.iteritems():
+# TODO: handle nested timestamps?
+if is_datetime64tz_dtype(series.dtype):
+df[column] = 
series.dt.tz_convert('tzlocal()').dt.tz_localize(None)
+return df
+
+
+def _check_series_convert_timestamps_internal(s):
+""" Convert a tz-naive timestamp in local tz to UTC normalized for 
Spark internal storage
+"""
+from pandas.api.types import is_datetime64_dtype
+# TODO: handle nested timestamps?
--- End diff --

We should definite add a test to assert what an array returns


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19514: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-10-20 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/19514
  
Good point. I'm not sure it counteracts it completely. We should run it to 
see the behavior I guess.

I am not a big fan of mucking with Jenkins versions because it 
fundamentally looks like CRAN doesn't like us pushing newer versions from older 
branches ? For example if we release 2.2.1 then we can't submit 2.1.3 to CRAN. 
We should first discuss if we are okay with that -- we can move this to a JIRA ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19538: [SPARK-20393][WEBU UI][2.0] Strengthen Spark to prevent ...

2017-10-20 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19538
  
link to 1.6 PR #19528


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19535: [SPARK-22313][PYTHON] Mark/print deprecation warn...

2017-10-20 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19535#discussion_r146024871
  
--- Diff: python/pyspark/streaming/kafka.py ---
@@ -58,6 +60,7 @@ def createStream(ssc, zkQuorum, groupId, topics, 
kafkaParams=None,
 
 .. note:: Deprecated in 2.3.0
 """
+warnings.warn("Deprecated in 2.3.0.", DeprecationWarning)
--- End diff --

ditto here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19535: [SPARK-22313][PYTHON] Mark/print deprecation warn...

2017-10-20 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19535#discussion_r146024844
  
--- Diff: python/pyspark/streaming/flume.py ---
@@ -56,6 +56,7 @@ def createStream(ssc, hostname, port,
 
 .. note:: Deprecated in 2.3.0
 """
+warnings.warn("Deprecated in 2.3.0.", DeprecationWarning)
--- End diff --

for these, could you provide more information? link to the doc on 
deprecating DStream in python?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19538: [SPARK-20393][WEBU UI][2.0] Strengthen Spark to prevent ...

2017-10-20 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19538
  
could you update the PR title to say `[BACKPORT-2.0]` instead of `[2.0]`. 
also please type to PR # for the earlier commit to link them here.

you mention there is a discussion, could you link them here. are you 
looking for an official release for 1.6.x?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19543
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19543
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82937/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19543
  
**[Test build #82937 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82937/testReport)**
 for PR 19543 at commit 
[`11d5859`](https://github.com/apache/spark/commit/11d58593edd43b651bdbe5c269fc051a94269747).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19538: [SPARK-20393][WEBU UI][2.0] Strengthen Spark to prevent ...

2017-10-20 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19538
  
ignore SparkR test failure for now, we are looking into it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19543
  
**[Test build #82937 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82937/testReport)**
 for PR 19543 at commit 
[`11d5859`](https://github.com/apache/spark/commit/11d58593edd43b651bdbe5c269fc051a94269747).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...

2017-10-20 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19543
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19267: [WIP][SPARK-20628][CORE] Blacklist nodes when they trans...

2017-10-20 Thread juanrh

Github user juanrh commented on the issue:

https://github.com/apache/spark/pull/19267
  
Hi @vanzin and @tgravescs, do you have any other comments on this proposal? 

Thanks, 

Juan


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19514: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-10-20 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19514
  
I haven't tried it, but it might sound like it will counter-act -as-cran 
check sets completely?
```
R_CHECK_CRAN_INCOMING_
Check whether package is suitable for publication on CRAN. Default: false, 
except for CRAN submission checks.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19514: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-10-20 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19514
  
whoa.
`_R_CHECK_CRAN_INCOMING_= false` sounds like the right approach. I'm a bit 
concerned with blindly letting through one more warning though, perhaps grep 
for the specific warning text and only let one more through if it matches?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19485
  
Thanks for explanation. I guess there would be a big doc change soon? Will 
check those changes too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19514: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-10-20 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/19514
  
We didn't foresee this but it looks like `R CMD check --as-cran` throws 
this error if we try to build a package with a version number older than the 
one uploaded to CRAN.

There are a couple of ways around this -- we can set an environment 
variable `_R_CHECK_CRAN_INCOMING_= false` (documented in [1]) or we can change 
our `check-cran.sh` to admit one more `WARNING`. This would of course only be 
done for `branch-2.0`

Any thoughts @felixcheung @HyukjinKwon ?

[1] https://cran.r-project.org/doc/manuals/r-release/R-ints.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19539: [MINOR] [SQL] Remove unnecessary hashCode and equals met...

2017-10-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19539
  
Could you open a JIRA?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18270: [SPARK-21055][SQL] replace grouping__id with grou...

2017-10-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18270


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-10-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18270
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-10-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18270
  
LGTM 

Let us resolve the issue as the follow-up PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply()...

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19517
  
**[Test build #82936 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82936/testReport)**
 for PR 19517 at commit 
[`59d61a4`](https://github.com/apache/spark/commit/59d61a46a15b00f8af9ec8e2c6930853b7097b1c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply()...

2017-10-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19517
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19544: [SPARK-22323] Design doc for pandas_udf

2017-10-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19544
  
@jiangxb1987 will reorg the existing Spark SQL doc. We can think about how 
to put this into the new version of Spark SQL doc. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19485
  
The reference manual and API docs are different. Below is a link of DB2 LUW:
http://www-01.ibm.com/support/docview.wss?uid=swg27038855


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-20 Thread susanxhuynh

Github user susanxhuynh commented on the issue:

https://github.com/apache/spark/pull/19437
  
@vanzin Ping, would you mind reviewing this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19544: [SPARK-22323] Design doc for pandas_udf

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19544
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82935/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19544: [SPARK-22323] Design doc for pandas_udf

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19544
  
**[Test build #82935 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82935/testReport)**
 for PR 19544 at commit 
[`3005312`](https://github.com/apache/spark/commit/3005312e0b5c0255ddd23736bfd24e2abf6cad95).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19544: [SPARK-22323] Design doc for pandas_udf

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19544
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19515: [SPARK-22287][MESOS] SPARK_DAEMON_MEMORY not honored by ...

2017-10-20 Thread pmackles

Github user pmackles commented on the issue:

https://github.com/apache/spark/pull/19515
  
@ArtRand - WDYT? I was going to switch it to ```SPARK_DISPATCHER_MEMORY``` 
but then I noticed that the other env vars for MesosClusterDispatcher or also 
prefixed with ```SPARK_DAEMON_*``` so I thought it might be better to keep the 
names consistent. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19544: [SPARK-22323] Design doc for pandas_udf

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19544
  
**[Test build #82935 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82935/testReport)**
 for PR 19544 at commit 
[`3005312`](https://github.com/apache/spark/commit/3005312e0b5c0255ddd23736bfd24e2abf6cad95).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19544: [SPARK-22323] Design doc for pandas_udf

2017-10-20 Thread icexelloss

Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/19544
  
cc @cloud-fan @ueshin @HyukjinKwon @gatorsmile @viirya 

To continue the discussion on #19505 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19544: [SPARK-22323] Design doc for pandas_udf

2017-10-20 Thread icexelloss

GitHub user icexelloss opened a pull request:

https://github.com/apache/spark/pull/19544

[SPARK-22323] Design doc for pandas_udf

I open this PR so we can have a place to discuss the design. 

We don't necessary need to merge a md file for the doc -  this be embeded 
python documentation. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/icexelloss/spark 
pandas-udf-design-doc-SPARK-22323

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19544.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19544


commit 3005312e0b5c0255ddd23736bfd24e2abf6cad95
Author: Li Jin 
Date:   2017-10-20T15:09:08Z

Initial design doc for pandas_udf




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19543
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19543: [SPARK-19606][MESOS] Support constraints in spark...

2017-10-20 Thread pmackles

GitHub user pmackles opened a pull request:

https://github.com/apache/spark/pull/19543

[SPARK-19606][MESOS] Support constraints in spark-dispatcher

## What changes were proposed in this pull request?

A discussed in SPARK-19606, the addition of a new config property named 
"spark.mesos.constraints.driver" for constraining drivers running on a Mesos 
cluster

## How was this patch tested?

Corresponding unit test added also tested locally on a Mesos cluster

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pmackles/spark SPARK-19606

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19543.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19543


commit 11d58593edd43b651bdbe5c269fc051a94269747
Author: Paul Mackles 
Date:   2017-10-20T15:08:33Z

[SPARK-19606] Support constraints in spark-dispatcher




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19485
  
@gatorsmile, sure, detailed doc is great and defintely I support it.

Just one thing I am worried of is duplication. If we add or change option, 
we have to update those  together and .. you know it.

Wouldn't it be nicer if we simply leave a pointer and remove the 
duplication if possible? If I understood correctly, the options would also be 
described in more details in the future in the new chapter and I think simpliy 
redirecting it might be feasible.

I guess it shouldn't be too difficult to make a sub-chapter for options 
only, for example, like 
http://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options

Otherwise, would you maybe thimk there should be dfferent contents for a 
different purpose, or want to leave the duplication just for now as something 
to be fixed soon? If so, I am okay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18974: [SPARK-21750][SQL] Use Arrow 0.6.0

2017-10-20 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18974
  
Hi, All.
Two more Arrow releases seem to be out. How about the Python side? Can we 
catch up some?
- 0.7.1 (1 October 2017)
- 0.7.0 (17 September 2017)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19527
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19527
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82934/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19527
  
**[Test build #82934 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82934/testReport)**
 for PR 19527 at commit 
[`e024120`](https://github.com/apache/spark/commit/e0241200c58a5ec201a0f1abdebc1660878ed49f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19479
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19479
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82931/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19479
  
**[Test build #82931 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82931/testReport)**
 for PR 19479 at commit 
[`6fe9985`](https://github.com/apache/spark/commit/6fe9985872c93b5dfa9972300ba3f59e97834d4c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19527
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82933/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...

2017-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19527
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...

2017-10-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19527
  
**[Test build #82933 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82933/testReport)**
 for PR 19527 at commit 
[`fe80e98`](https://github.com/apache/spark/commit/fe80e98712f52a4b5795c96a20e8f92e65849cb4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 201 matches

Mail list logo