[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19272 **[Test build #82941 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82941/testReport)** for PR 19272 at commit [`c95f80b`](https://github.com/apache/spark/commit/c95f80b23d47ea4640cea2b4a185fa4bf9e9f33d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r146096087 --- Diff: python/pyspark/sql/types.py --- @@ -1619,11 +1619,39 @@ def to_arrow_type(dt): arrow_type = pa.decimal(dt.precision, dt.scale) elif type(dt) == StringType: arrow_type = pa.string() +elif type(dt) == DateType: +arrow_type = pa.date32() +elif type(dt) == TimestampType: +# Timestamps should be in UTC, JVM Arrow timestamps require a timezone to be read +arrow_type = pa.timestamp('us', tz='UTC') else: raise TypeError("Unsupported type in conversion to Arrow: " + str(dt)) return arrow_type +def _check_dataframe_localize_timestamps(df): +""" Convert timezone aware timestamps to timezone-naive in local time +""" +from pandas.api.types import is_datetime64tz_dtype +for column, series in df.iteritems(): +# TODO: handle nested timestamps? +if is_datetime64tz_dtype(series.dtype): +df[column] = series.dt.tz_convert('tzlocal()').dt.tz_localize(None) +return df + + +def _check_series_convert_timestamps_internal(s): +""" Convert a tz-naive timestamp in local tz to UTC normalized for Spark internal storage +""" +from pandas.api.types import is_datetime64_dtype +# TODO: handle nested timestamps? --- End diff -- If it is unsupported, could you also add a negative test case if not existed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD C...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19545 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_nam...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19545#discussion_r146096042 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -2202,56 +2202,64 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } + def testAddColumn(provider: String): Unit = { --- End diff -- Nit: `protected` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_nam...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19545#discussion_r146096038 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -2202,56 +2202,64 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } + def testAddColumn(provider: String): Unit = { +withTable("t1") { + sql(s"CREATE TABLE t1 (c1 int) USING $provider") + sql("INSERT INTO t1 VALUES (1)") + sql("ALTER TABLE t1 ADD COLUMNS (c2 int)") + checkAnswer( +spark.table("t1"), +Seq(Row(1, null)) + ) + checkAnswer( +sql("SELECT * FROM t1 WHERE c2 is null"), +Seq(Row(1, null)) + ) + + sql("INSERT INTO t1 VALUES (3, 2)") + checkAnswer( +sql("SELECT * FROM t1 WHERE c2 = 2"), +Seq(Row(3, 2)) + ) +} + } + + def testAddColumnPartitioned(provider: String): Unit = { --- End diff -- Nit: `protected` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19539: [SPARK-22326] [SQL] Remove unnecessary hashCode a...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19539 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19539: [SPARK-22326] [SQL] Remove unnecessary hashCode and equa...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19539 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19539: [SPARK-22326] [SQL] Remove unnecessary hashCode and equa...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19539 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to p...
Github user ambauma commented on a diff in the pull request: https://github.com/apache/spark/pull/19528#discussion_r146095022 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/ui/DriverPage.scala --- @@ -0,0 +1,180 @@ +/* --- End diff -- I'm not sure what I did to make this whole file look new, but I've copied the 1.6 current and reapplied stripXSS locally. Waiting for my build to pass to commit again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19514: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19514 Thanks, I have been following it @shivaram and @felixcheung. Separate JIRA sounds good to me and I am okay witn merging it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19514: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19514 LGTM too BTW. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19272#discussion_r146094019 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -194,6 +198,27 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( sc.conf.getOption("spark.mesos.driver.frameworkId").map(_ + suffix) ) +// check that the credentials are defined, even though it's likely that auth would have failed +// already if you've made it this far +if (principal != null && hadoopDelegationCreds.isDefined) { + logDebug(s"Principal found ($principal) starting token renewer") + val credentialRenewerThread = new Thread { +setName("MesosCredentialRenewer") +override def run(): Unit = { + val rt = MesosCredentialRenewer.getTokenRenewalTime(hadoopDelegationCreds.get, conf) + val credentialRenewer = +new MesosCredentialRenewer( + conf, + hadoopDelegationTokenManager.get, + MesosCredentialRenewer.getNextRenewalTime(rt), + driverEndpoint) + credentialRenewer.scheduleTokenRenewal() +} + } + + credentialRenewerThread.start() + credentialRenewerThread.join() --- End diff -- Ok, you're probably right. It appears that the YARN code uses `setContextClassLoader(userClassLoader)` whereas in Mesos does not has a notion of `userClassLoader`. Therefore we don't need the separate thread in the Mesos code. Do I have this correct? Thanks for showing me this! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19539: [SPARK-22326] [SQL] Remove unnecessary hashCode and equa...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19539 @gatorsmile JIRA created. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82940/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19468 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19468 **[Test build #82940 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82940/testReport)** for PR 19468 at commit [`c565c9f`](https://github.com/apache/spark/commit/c565c9ffd7e5371ee4425d69ecaf49ce92199fc7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19468 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82938/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19468 **[Test build #82938 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82938/testReport)** for PR 19468 at commit [`c052212`](https://github.com/apache/spark/commit/c052212888e01eac90a006bfb5d14c513e33d0a3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD C...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19545 Hi, @gatorsmile . Could you review this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_nam...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19545#discussion_r146090102 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -235,11 +235,10 @@ case class AlterTableAddColumnsCommand( DataSource.lookupDataSource(catalogTable.provider.get).newInstance() match { // For datasource table, this command can only support the following File format. // TextFileFormat only default to one column "value" -// OrcFileFormat can not handle difference between user-specified schema and -// inferred schema yet. TODO, once this issue is resolved , we can add Orc back. // Hive type is already considered as hive serde table, so the logic will not // come in here. case _: JsonFileFormat | _: CSVFileFormat | _: ParquetFileFormat => +case s if s.getClass.getCanonicalName.endsWith("OrcFileFormat") => --- End diff -- After implementing OrcFileFormat based on Apache ORC, we can move `OrcFileFormat` from `sql/hive` module into `sql/core` module. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD C...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19545 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82939/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD C...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19545 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD C...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19545 **[Test build #82939 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82939/testReport)** for PR 19545 at commit [`cc52547`](https://github.com/apache/spark/commit/cc525479951868ff7094097aea886819c29fb549). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19534: [SPARK-22312][CORE] Fix bug in Executor allocation manag...
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/19534 I think other PR is fixing one more issue on top of runningTasks being negative, so we can proceed with the other one. What do you think @jerryshao ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to p...
Github user ambauma commented on a diff in the pull request: https://github.com/apache/spark/pull/19528#discussion_r146084730 --- Diff: python/pyspark/mllib/classification.py --- @@ -173,7 +173,7 @@ def __init__(self, weights, intercept, numFeatures, numClasses): self._dataWithBiasSize = None self._weightsMatrix = None else: -self._dataWithBiasSize = self._coeff.size / (self._numClasses - 1) +self._dataWithBiasSize = self._coeff.size // (self._numClasses - 1) --- End diff -- The NewSparkPullRequestBuilder failed on python tests. I was only able to duplicate the failure with Python 3.4 and numpy 1.12.1, which I'm guessing is the versions that NewSparkPullRequestBuilder is using. Older and newer versions of numpy build clean either way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to p...
Github user ambauma commented on a diff in the pull request: https://github.com/apache/spark/pull/19528#discussion_r146084021 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala --- @@ -16,9 +16,9 @@ */ package org.apache.spark.ui.jobs - +import javax.servlet.http.HttpServletRequest --- End diff -- Agreed, will remove. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to p...
Github user ambauma commented on a diff in the pull request: https://github.com/apache/spark/pull/19528#discussion_r146080377 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/ui/DriverPage.scala --- @@ -0,0 +1,180 @@ +/* --- End diff -- I'll look into this as well... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to p...
Github user ambauma commented on a diff in the pull request: https://github.com/apache/spark/pull/19528#discussion_r146080311 --- Diff: python/pyspark/mllib/classification.py --- @@ -173,7 +173,7 @@ def __init__(self, weights, intercept, numFeatures, numClasses): self._dataWithBiasSize = None self._weightsMatrix = None else: -self._dataWithBiasSize = self._coeff.size / (self._numClasses - 1) +self._dataWithBiasSize = self._coeff.size // (self._numClasses - 1) --- End diff -- This is already fixed in the 2.0 branch, btw. Just was never applied to 1.6. [SPARK-20862] --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to p...
Github user ambauma commented on a diff in the pull request: https://github.com/apache/spark/pull/19528#discussion_r146080177 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala --- @@ -16,9 +16,9 @@ */ package org.apache.spark.ui.jobs - +import javax.servlet.http.HttpServletRequest --- End diff -- Will look into this... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to p...
Github user ambauma commented on a diff in the pull request: https://github.com/apache/spark/pull/19528#discussion_r146080089 --- Diff: python/pyspark/mllib/classification.py --- @@ -173,7 +173,7 @@ def __init__(self, weights, intercept, numFeatures, numClasses): self._dataWithBiasSize = None self._weightsMatrix = None else: -self._dataWithBiasSize = self._coeff.size / (self._numClasses - 1) +self._dataWithBiasSize = self._coeff.size // (self._numClasses - 1) --- End diff -- I had to apply this to get past a python unit test failure. My assumption is that the NewSparkPullRequestBuilder is on a different version of numpy than when the Spark 1.6 branch was last built. The current python unit test failure looks like it has to do with a novel version of SciPy. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19538: [SPARK-20393][WEBU UI][BACKPORT-2.0] Strengthen Spark to...
Github user ambauma commented on the issue: https://github.com/apache/spark/pull/19538 I'm not looking for an official release. My goal is to get the fix into the official branch 1.6 to reduce the number of forks necessary and so that if CVE-2018- comes and I've moved on my replacement doesn't have to apply this plus that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...
Github user pmackles commented on the issue: https://github.com/apache/spark/pull/19543 @felixcheung - fixed scala-style issues and also updated the docs to include the new property --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19514: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19514 I think we can safely merge this change as it clearly passes any tests whose functionality it would affect. We could defer further discussion about what to do about CRAN versions elsewhere, yes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user foxish commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r146074603 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala --- @@ -0,0 +1,456 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.io.Closeable +import java.net.InetAddress +import java.util.concurrent.{ConcurrentHashMap, ExecutorService, ScheduledExecutorService, TimeUnit} +import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, AtomicReference} + +import scala.collection.{concurrent, mutable} +import scala.collection.JavaConverters._ +import scala.concurrent.{ExecutionContext, Future} + +import io.fabric8.kubernetes.api.model._ +import io.fabric8.kubernetes.client.{KubernetesClient, KubernetesClientException, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv} +import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend +import org.apache.spark.util.Utils + +private[spark] class KubernetesClusterSchedulerBackend( +scheduler: TaskSchedulerImpl, +rpcEnv: RpcEnv, +executorPodFactory: ExecutorPodFactory, +kubernetesClient: KubernetesClient, +allocatorExecutor: ScheduledExecutorService, +requestExecutorsService: ExecutorService) + extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) { + + import KubernetesClusterSchedulerBackend._ + + private val EXECUTOR_ID_COUNTER = new AtomicLong(0L) + private val RUNNING_EXECUTOR_PODS_LOCK = new Object + // Indexed by executor IDs and guarded by RUNNING_EXECUTOR_PODS_LOCK. + private val runningExecutorsToPods = new mutable.HashMap[String, Pod] + // Indexed by executor pod names and guarded by RUNNING_EXECUTOR_PODS_LOCK. + private val runningPodsToExecutors = new mutable.HashMap[String, String] + private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]() + private val podsWithKnownExitReasons = new ConcurrentHashMap[String, ExecutorExited]() + private val disconnectedPodsByExecutorIdPendingRemoval = new ConcurrentHashMap[String, Pod]() + + private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE) + + private val kubernetesDriverPodName = conf +.get(KUBERNETES_DRIVER_POD_NAME) +.getOrElse( + throw new SparkException("Must specify the driver pod name")) + private implicit val requestExecutorContext = ExecutionContext.fromExecutorService( + requestExecutorsService) + + private val driverPod = try { +kubernetesClient.pods() + .inNamespace(kubernetesNamespace) + .withName(kubernetesDriverPodName) + .get() + } catch { +case throwable: Throwable => + logError(s"Executor cannot find driver pod.", throwable) + throw new SparkException(s"Executor cannot find driver pod", throwable) + } + + override val minRegisteredRatio = +if (conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) { + 0.8 +} else { + super.minRegisteredRatio +} + + private val executorWatchResource = new AtomicReference[Closeable] + protected var totalExpectedExecutors = new AtomicInteger(0) + + private val driverUrl = RpcEndpointAddress( + conf.get("spark.driver.host"), + conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT), + CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString + + private val initialExecutors = getInitialTargetExecutorNumber() + + private val podAllocationInterval =
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user foxish commented on the issue: https://github.com/apache/spark/pull/19468 @vanzin, you were right, the YARN constants were left overs and made no sense wrt k8s. We discussed it in our weekly meeting - it was simply dead code. I've addressed most of the style comments and the major concern about the constants. It's ready for a more in-depth review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19468 **[Test build #82940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82940/testReport)** for PR 19468 at commit [`c565c9f`](https://github.com/apache/spark/commit/c565c9ffd7e5371ee4425d69ecaf49ce92199fc7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r146072523 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala --- @@ -0,0 +1,456 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.io.Closeable +import java.net.InetAddress +import java.util.concurrent.{ConcurrentHashMap, ExecutorService, ScheduledExecutorService, TimeUnit} +import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, AtomicReference} + +import scala.collection.{concurrent, mutable} +import scala.collection.JavaConverters._ +import scala.concurrent.{ExecutionContext, Future} + +import io.fabric8.kubernetes.api.model._ +import io.fabric8.kubernetes.client.{KubernetesClient, KubernetesClientException, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv} +import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend +import org.apache.spark.util.Utils + +private[spark] class KubernetesClusterSchedulerBackend( +scheduler: TaskSchedulerImpl, +rpcEnv: RpcEnv, +executorPodFactory: ExecutorPodFactory, +kubernetesClient: KubernetesClient, +allocatorExecutor: ScheduledExecutorService, +requestExecutorsService: ExecutorService) + extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) { + + import KubernetesClusterSchedulerBackend._ + + private val EXECUTOR_ID_COUNTER = new AtomicLong(0L) + private val RUNNING_EXECUTOR_PODS_LOCK = new Object + // Indexed by executor IDs and guarded by RUNNING_EXECUTOR_PODS_LOCK. + private val runningExecutorsToPods = new mutable.HashMap[String, Pod] + // Indexed by executor pod names and guarded by RUNNING_EXECUTOR_PODS_LOCK. + private val runningPodsToExecutors = new mutable.HashMap[String, String] + private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]() + private val podsWithKnownExitReasons = new ConcurrentHashMap[String, ExecutorExited]() + private val disconnectedPodsByExecutorIdPendingRemoval = new ConcurrentHashMap[String, Pod]() + + private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE) + + private val kubernetesDriverPodName = conf +.get(KUBERNETES_DRIVER_POD_NAME) +.getOrElse( + throw new SparkException("Must specify the driver pod name")) + private implicit val requestExecutorContext = ExecutionContext.fromExecutorService( + requestExecutorsService) + + private val driverPod = try { +kubernetesClient.pods() + .inNamespace(kubernetesNamespace) + .withName(kubernetesDriverPodName) + .get() + } catch { +case throwable: Throwable => + logError(s"Executor cannot find driver pod.", throwable) + throw new SparkException(s"Executor cannot find driver pod", throwable) + } + + override val minRegisteredRatio = +if (conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) { + 0.8 +} else { + super.minRegisteredRatio +} + + private val executorWatchResource = new AtomicReference[Closeable] + protected var totalExpectedExecutors = new AtomicInteger(0) + + private val driverUrl = RpcEndpointAddress( + conf.get("spark.driver.host"), + conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT), + CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString + + private val initialExecutors = getInitialTargetExecutorNumber() + + private val podAllocationInterval =
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user foxish commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r146072553 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala --- @@ -0,0 +1,456 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.io.Closeable +import java.net.InetAddress +import java.util.concurrent.{ConcurrentHashMap, ExecutorService, ScheduledExecutorService, TimeUnit} +import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, AtomicReference} + +import scala.collection.{concurrent, mutable} +import scala.collection.JavaConverters._ +import scala.concurrent.{ExecutionContext, Future} + +import io.fabric8.kubernetes.api.model._ +import io.fabric8.kubernetes.client.{KubernetesClient, KubernetesClientException, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv} +import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend +import org.apache.spark.util.Utils + +private[spark] class KubernetesClusterSchedulerBackend( +scheduler: TaskSchedulerImpl, +rpcEnv: RpcEnv, +executorPodFactory: ExecutorPodFactory, +kubernetesClient: KubernetesClient, +allocatorExecutor: ScheduledExecutorService, +requestExecutorsService: ExecutorService) + extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) { + + import KubernetesClusterSchedulerBackend._ + + private val EXECUTOR_ID_COUNTER = new AtomicLong(0L) + private val RUNNING_EXECUTOR_PODS_LOCK = new Object + // Indexed by executor IDs and guarded by RUNNING_EXECUTOR_PODS_LOCK. + private val runningExecutorsToPods = new mutable.HashMap[String, Pod] + // Indexed by executor pod names and guarded by RUNNING_EXECUTOR_PODS_LOCK. + private val runningPodsToExecutors = new mutable.HashMap[String, String] + private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]() + private val podsWithKnownExitReasons = new ConcurrentHashMap[String, ExecutorExited]() + private val disconnectedPodsByExecutorIdPendingRemoval = new ConcurrentHashMap[String, Pod]() + + private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE) + + private val kubernetesDriverPodName = conf +.get(KUBERNETES_DRIVER_POD_NAME) +.getOrElse( + throw new SparkException("Must specify the driver pod name")) + private implicit val requestExecutorContext = ExecutionContext.fromExecutorService( + requestExecutorsService) + + private val driverPod = try { +kubernetesClient.pods() + .inNamespace(kubernetesNamespace) + .withName(kubernetesDriverPodName) + .get() + } catch { +case throwable: Throwable => --- End diff -- @ash211, PTAL --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user foxish commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r146072501 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala --- @@ -0,0 +1,456 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.io.Closeable +import java.net.InetAddress +import java.util.concurrent.{ConcurrentHashMap, ExecutorService, ScheduledExecutorService, TimeUnit} +import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, AtomicReference} + +import scala.collection.{concurrent, mutable} +import scala.collection.JavaConverters._ +import scala.concurrent.{ExecutionContext, Future} + +import io.fabric8.kubernetes.api.model._ +import io.fabric8.kubernetes.client.{KubernetesClient, KubernetesClientException, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv} +import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend +import org.apache.spark.util.Utils + +private[spark] class KubernetesClusterSchedulerBackend( +scheduler: TaskSchedulerImpl, +rpcEnv: RpcEnv, +executorPodFactory: ExecutorPodFactory, +kubernetesClient: KubernetesClient, +allocatorExecutor: ScheduledExecutorService, +requestExecutorsService: ExecutorService) + extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) { + + import KubernetesClusterSchedulerBackend._ + + private val EXECUTOR_ID_COUNTER = new AtomicLong(0L) + private val RUNNING_EXECUTOR_PODS_LOCK = new Object + // Indexed by executor IDs and guarded by RUNNING_EXECUTOR_PODS_LOCK. + private val runningExecutorsToPods = new mutable.HashMap[String, Pod] + // Indexed by executor pod names and guarded by RUNNING_EXECUTOR_PODS_LOCK. + private val runningPodsToExecutors = new mutable.HashMap[String, String] + private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]() + private val podsWithKnownExitReasons = new ConcurrentHashMap[String, ExecutorExited]() + private val disconnectedPodsByExecutorIdPendingRemoval = new ConcurrentHashMap[String, Pod]() + + private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE) + + private val kubernetesDriverPodName = conf +.get(KUBERNETES_DRIVER_POD_NAME) +.getOrElse( + throw new SparkException("Must specify the driver pod name")) + private implicit val requestExecutorContext = ExecutionContext.fromExecutorService( + requestExecutorsService) + + private val driverPod = try { +kubernetesClient.pods() + .inNamespace(kubernetesNamespace) + .withName(kubernetesDriverPodName) + .get() + } catch { +case throwable: Throwable => + logError(s"Executor cannot find driver pod.", throwable) + throw new SparkException(s"Executor cannot find driver pod", throwable) + } + + override val minRegisteredRatio = +if (conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) { + 0.8 +} else { + super.minRegisteredRatio +} + + private val executorWatchResource = new AtomicReference[Closeable] + protected var totalExpectedExecutors = new AtomicInteger(0) + + private val driverUrl = RpcEndpointAddress( + conf.get("spark.driver.host"), + conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT), + CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString + + private val initialExecutors = getInitialTargetExecutorNumber() + + private val podAllocationInterval =
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user foxish commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r146072126 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala --- @@ -0,0 +1,456 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.io.Closeable +import java.net.InetAddress +import java.util.concurrent.{ConcurrentHashMap, ExecutorService, ScheduledExecutorService, TimeUnit} +import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, AtomicReference} + +import scala.collection.{concurrent, mutable} +import scala.collection.JavaConverters._ +import scala.concurrent.{ExecutionContext, Future} + +import io.fabric8.kubernetes.api.model._ +import io.fabric8.kubernetes.client.{KubernetesClient, KubernetesClientException, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv} +import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend +import org.apache.spark.util.Utils + +private[spark] class KubernetesClusterSchedulerBackend( +scheduler: TaskSchedulerImpl, +rpcEnv: RpcEnv, +executorPodFactory: ExecutorPodFactory, +kubernetesClient: KubernetesClient, +allocatorExecutor: ScheduledExecutorService, +requestExecutorsService: ExecutorService) + extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) { + + import KubernetesClusterSchedulerBackend._ + + private val EXECUTOR_ID_COUNTER = new AtomicLong(0L) + private val RUNNING_EXECUTOR_PODS_LOCK = new Object + // Indexed by executor IDs and guarded by RUNNING_EXECUTOR_PODS_LOCK. + private val runningExecutorsToPods = new mutable.HashMap[String, Pod] + // Indexed by executor pod names and guarded by RUNNING_EXECUTOR_PODS_LOCK. + private val runningPodsToExecutors = new mutable.HashMap[String, String] + private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]() + private val podsWithKnownExitReasons = new ConcurrentHashMap[String, ExecutorExited]() + private val disconnectedPodsByExecutorIdPendingRemoval = new ConcurrentHashMap[String, Pod]() + + private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE) + + private val kubernetesDriverPodName = conf +.get(KUBERNETES_DRIVER_POD_NAME) +.getOrElse( + throw new SparkException("Must specify the driver pod name")) + private implicit val requestExecutorContext = ExecutionContext.fromExecutorService( + requestExecutorsService) + + private val driverPod = try { +kubernetesClient.pods() + .inNamespace(kubernetesNamespace) + .withName(kubernetesDriverPodName) + .get() + } catch { +case throwable: Throwable => + logError(s"Executor cannot find driver pod.", throwable) + throw new SparkException(s"Executor cannot find driver pod", throwable) + } + + override val minRegisteredRatio = +if (conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) { + 0.8 +} else { + super.minRegisteredRatio +} + + private val executorWatchResource = new AtomicReference[Closeable] + protected var totalExpectedExecutors = new AtomicInteger(0) + + private val driverUrl = RpcEndpointAddress( + conf.get("spark.driver.host"), + conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT), + CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString + + private val initialExecutors = getInitialTargetExecutorNumber() + + private val podAllocationInterval =
[GitHub] spark issue #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD C...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19545 **[Test build #82939 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82939/testReport)** for PR 19545 at commit [`cc52547`](https://github.com/apache/spark/commit/cc525479951868ff7094097aea886819c29fb549). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19468 **[Test build #82938 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82938/testReport)** for PR 19468 at commit [`c052212`](https://github.com/apache/spark/commit/c052212888e01eac90a006bfb5d14c513e33d0a3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19545: [SPARK-21929][SQL] Support `ALTER TABLE table_nam...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/19545 [SPARK-21929][SQL] Support `ALTER TABLE table_name ADD COLUMNS(..)` for ORC data source ## What changes were proposed in this pull request? When SPARK-19261 implements `ALTER TABLE ADD COLUMNS`, ORC data source is omitted due to SPARK-14387, SPARK-16628, and SPARK-18355. Now, those issues are fixed and Spark 2.3 is using Spark schema to read ORC table instead of ORC file schema. This PR enables `ALTER TABLE ADD COLUMNS` for ORC data source. ## How was this patch tested? Pass the updated and added test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-21929 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19545.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19545 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r146066999 --- Diff: python/pyspark/sql/types.py --- @@ -1619,11 +1619,39 @@ def to_arrow_type(dt): arrow_type = pa.decimal(dt.precision, dt.scale) elif type(dt) == StringType: arrow_type = pa.string() +elif type(dt) == DateType: +arrow_type = pa.date32() +elif type(dt) == TimestampType: +# Timestamps should be in UTC, JVM Arrow timestamps require a timezone to be read +arrow_type = pa.timestamp('us', tz='UTC') else: raise TypeError("Unsupported type in conversion to Arrow: " + str(dt)) return arrow_type +def _check_dataframe_localize_timestamps(df): +""" Convert timezone aware timestamps to timezone-naive in local time +""" +from pandas.api.types import is_datetime64tz_dtype +for column, series in df.iteritems(): +# TODO: handle nested timestamps? +if is_datetime64tz_dtype(series.dtype): +df[column] = series.dt.tz_convert('tzlocal()').dt.tz_localize(None) +return df + + +def _check_series_convert_timestamps_internal(s): +""" Convert a tz-naive timestamp in local tz to UTC normalized for Spark internal storage +""" +from pandas.api.types import is_datetime64_dtype +# TODO: handle nested timestamps? --- End diff -- Sorry @wesm, I meant on the Spark python side. If a pyspark ArrayType is used a TypeError is raised indicating it is an unsupported type. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...
Github user wesm commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r146062157 --- Diff: python/pyspark/sql/types.py --- @@ -1619,11 +1619,39 @@ def to_arrow_type(dt): arrow_type = pa.decimal(dt.precision, dt.scale) elif type(dt) == StringType: arrow_type = pa.string() +elif type(dt) == DateType: +arrow_type = pa.date32() +elif type(dt) == TimestampType: +# Timestamps should be in UTC, JVM Arrow timestamps require a timezone to be read +arrow_type = pa.timestamp('us', tz='UTC') else: raise TypeError("Unsupported type in conversion to Arrow: " + str(dt)) return arrow_type +def _check_dataframe_localize_timestamps(df): +""" Convert timezone aware timestamps to timezone-naive in local time +""" +from pandas.api.types import is_datetime64tz_dtype +for column, series in df.iteritems(): +# TODO: handle nested timestamps? +if is_datetime64tz_dtype(series.dtype): +df[column] = series.dt.tz_convert('tzlocal()').dt.tz_localize(None) +return df + + +def _check_series_convert_timestamps_internal(s): +""" Convert a tz-naive timestamp in local tz to UTC normalized for Spark internal storage +""" +from pandas.api.types import is_datetime64_dtype +# TODO: handle nested timestamps? --- End diff -- Arrays are supported in pyarrow (but perhaps not for timestamps? If that's true could you open a JIRA?), or do you mean something else? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r146058020 --- Diff: python/pyspark/sql/types.py --- @@ -1619,11 +1619,39 @@ def to_arrow_type(dt): arrow_type = pa.decimal(dt.precision, dt.scale) elif type(dt) == StringType: arrow_type = pa.string() +elif type(dt) == DateType: +arrow_type = pa.date32() +elif type(dt) == TimestampType: +# Timestamps should be in UTC, JVM Arrow timestamps require a timezone to be read +arrow_type = pa.timestamp('us', tz='UTC') else: raise TypeError("Unsupported type in conversion to Arrow: " + str(dt)) return arrow_type +def _check_dataframe_localize_timestamps(df): +""" Convert timezone aware timestamps to timezone-naive in local time +""" +from pandas.api.types import is_datetime64tz_dtype +for column, series in df.iteritems(): +# TODO: handle nested timestamps? +if is_datetime64tz_dtype(series.dtype): +df[column] = series.dt.tz_convert('tzlocal()').dt.tz_localize(None) +return df + + +def _check_series_convert_timestamps_internal(s): +""" Convert a tz-naive timestamp in local tz to UTC normalized for Spark internal storage +""" +from pandas.api.types import is_datetime64_dtype +# TODO: handle nested timestamps? --- End diff -- I don't believe arrays are supported yet on the python side of things, I plan to look at that next. Right now it will raise a TypeError --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19272#discussion_r146052502 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -194,6 +198,27 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( sc.conf.getOption("spark.mesos.driver.frameworkId").map(_ + suffix) ) +// check that the credentials are defined, even though it's likely that auth would have failed +// already if you've made it this far +if (principal != null && hadoopDelegationCreds.isDefined) { + logDebug(s"Principal found ($principal) starting token renewer") + val credentialRenewerThread = new Thread { +setName("MesosCredentialRenewer") +override def run(): Unit = { + val rt = MesosCredentialRenewer.getTokenRenewalTime(hadoopDelegationCreds.get, conf) + val credentialRenewer = +new MesosCredentialRenewer( + conf, + hadoopDelegationTokenManager.get, + MesosCredentialRenewer.getNextRenewalTime(rt), + driverEndpoint) + credentialRenewer.scheduleTokenRenewal() +} + } + + credentialRenewerThread.start() + credentialRenewerThread.join() --- End diff -- I don't think you really understood why the YARN code needs a thread and why I'm telling you this code does not. Read the comment you added here again; what makes you think the current thread does not have access to those classes? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19517 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply()...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19517 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply()...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19517 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82936/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply()...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19517 **[Test build #82936 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82936/testReport)** for PR 19517 at commit [`59d61a4`](https://github.com/apache/spark/commit/59d61a46a15b00f8af9ec8e2c6930853b7097b1c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19541: ABCD
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19541 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19542: Branch 1.1.
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19542 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...
Github user wesm commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r146044463 --- Diff: python/pyspark/sql/types.py --- @@ -1619,11 +1619,39 @@ def to_arrow_type(dt): arrow_type = pa.decimal(dt.precision, dt.scale) elif type(dt) == StringType: arrow_type = pa.string() +elif type(dt) == DateType: +arrow_type = pa.date32() +elif type(dt) == TimestampType: +# Timestamps should be in UTC, JVM Arrow timestamps require a timezone to be read +arrow_type = pa.timestamp('us', tz='UTC') else: raise TypeError("Unsupported type in conversion to Arrow: " + str(dt)) return arrow_type +def _check_dataframe_localize_timestamps(df): +""" Convert timezone aware timestamps to timezone-naive in local time +""" +from pandas.api.types import is_datetime64tz_dtype +for column, series in df.iteritems(): +# TODO: handle nested timestamps? +if is_datetime64tz_dtype(series.dtype): +df[column] = series.dt.tz_convert('tzlocal()').dt.tz_localize(None) +return df + + +def _check_series_convert_timestamps_internal(s): +""" Convert a tz-naive timestamp in local tz to UTC normalized for Spark internal storage +""" +from pandas.api.types import is_datetime64_dtype +# TODO: handle nested timestamps? --- End diff -- We should definite add a test to assert what an array returns --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19514: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/19514 Good point. I'm not sure it counteracts it completely. We should run it to see the behavior I guess. I am not a big fan of mucking with Jenkins versions because it fundamentally looks like CRAN doesn't like us pushing newer versions from older branches ? For example if we release 2.2.1 then we can't submit 2.1.3 to CRAN. We should first discuss if we are okay with that -- we can move this to a JIRA ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19538: [SPARK-20393][WEBU UI][2.0] Strengthen Spark to prevent ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19538 link to 1.6 PR #19528 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19535: [SPARK-22313][PYTHON] Mark/print deprecation warn...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/19535#discussion_r146024871 --- Diff: python/pyspark/streaming/kafka.py --- @@ -58,6 +60,7 @@ def createStream(ssc, zkQuorum, groupId, topics, kafkaParams=None, .. note:: Deprecated in 2.3.0 """ +warnings.warn("Deprecated in 2.3.0.", DeprecationWarning) --- End diff -- ditto here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19535: [SPARK-22313][PYTHON] Mark/print deprecation warn...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/19535#discussion_r146024844 --- Diff: python/pyspark/streaming/flume.py --- @@ -56,6 +56,7 @@ def createStream(ssc, hostname, port, .. note:: Deprecated in 2.3.0 """ +warnings.warn("Deprecated in 2.3.0.", DeprecationWarning) --- End diff -- for these, could you provide more information? link to the doc on deprecating DStream in python? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19538: [SPARK-20393][WEBU UI][2.0] Strengthen Spark to prevent ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19538 could you update the PR title to say `[BACKPORT-2.0]` instead of `[2.0]`. also please type to PR # for the earlier commit to link them here. you mention there is a discussion, could you link them here. are you looking for an official release for 1.6.x? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19543 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19543 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82937/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19543 **[Test build #82937 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82937/testReport)** for PR 19543 at commit [`11d5859`](https://github.com/apache/spark/commit/11d58593edd43b651bdbe5c269fc051a94269747). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19538: [SPARK-20393][WEBU UI][2.0] Strengthen Spark to prevent ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19538 ignore SparkR test failure for now, we are looking into it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19543 **[Test build #82937 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82937/testReport)** for PR 19543 at commit [`11d5859`](https://github.com/apache/spark/commit/11d58593edd43b651bdbe5c269fc051a94269747). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19543 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19267: [WIP][SPARK-20628][CORE] Blacklist nodes when they trans...
Github user juanrh commented on the issue: https://github.com/apache/spark/pull/19267 Hi @vanzin and @tgravescs, do you have any other comments on this proposal? Thanks, Juan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19514: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19514 I haven't tried it, but it might sound like it will counter-act -as-cran check sets completely? ``` R_CHECK_CRAN_INCOMING_ Check whether package is suitable for publication on CRAN. Default: false, except for CRAN submission checks. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19514: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19514 whoa. `_R_CHECK_CRAN_INCOMING_= false` sounds like the right approach. I'm a bit concerned with blindly letting through one more warning though, perhaps grep for the specific warning text and only let one more through if it matches? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 Thanks for explanation. I guess there would be a big doc change soon? Will check those changes too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19514: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/19514 We didn't foresee this but it looks like `R CMD check --as-cran` throws this error if we try to build a package with a version number older than the one uploaded to CRAN. There are a couple of ways around this -- we can set an environment variable `_R_CHECK_CRAN_INCOMING_= false` (documented in [1]) or we can change our `check-cran.sh` to admit one more `WARNING`. This would of course only be done for `branch-2.0` Any thoughts @felixcheung @HyukjinKwon ? [1] https://cran.r-project.org/doc/manuals/r-release/R-ints.html --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19539: [MINOR] [SQL] Remove unnecessary hashCode and equals met...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19539 Could you open a JIRA? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18270: [SPARK-21055][SQL] replace grouping__id with grou...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18270 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18270 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18270 LGTM Let us resolve the issue as the follow-up PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply()...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19517 **[Test build #82936 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82936/testReport)** for PR 19517 at commit [`59d61a4`](https://github.com/apache/spark/commit/59d61a46a15b00f8af9ec8e2c6930853b7097b1c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply()...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19517 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19544: [SPARK-22323] Design doc for pandas_udf
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19544 @jiangxb1987 will reorg the existing Spark SQL doc. We can think about how to put this into the new version of Spark SQL doc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19485 The reference manual and API docs are different. Below is a link of DB2 LUW: http://www-01.ibm.com/support/docview.wss?uid=swg27038855 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19437: [SPARK-22131][MESOS] Mesos driver secrets
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/19437 @vanzin Ping, would you mind reviewing this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19544: [SPARK-22323] Design doc for pandas_udf
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19544 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82935/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19544: [SPARK-22323] Design doc for pandas_udf
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19544 **[Test build #82935 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82935/testReport)** for PR 19544 at commit [`3005312`](https://github.com/apache/spark/commit/3005312e0b5c0255ddd23736bfd24e2abf6cad95). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19544: [SPARK-22323] Design doc for pandas_udf
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19544 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19515: [SPARK-22287][MESOS] SPARK_DAEMON_MEMORY not honored by ...
Github user pmackles commented on the issue: https://github.com/apache/spark/pull/19515 @ArtRand - WDYT? I was going to switch it to ```SPARK_DISPATCHER_MEMORY``` but then I noticed that the other env vars for MesosClusterDispatcher or also prefixed with ```SPARK_DAEMON_*``` so I thought it might be better to keep the names consistent. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19544: [SPARK-22323] Design doc for pandas_udf
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19544 **[Test build #82935 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82935/testReport)** for PR 19544 at commit [`3005312`](https://github.com/apache/spark/commit/3005312e0b5c0255ddd23736bfd24e2abf6cad95). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19544: [SPARK-22323] Design doc for pandas_udf
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/19544 cc @cloud-fan @ueshin @HyukjinKwon @gatorsmile @viirya To continue the discussion on #19505 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19544: [SPARK-22323] Design doc for pandas_udf
GitHub user icexelloss opened a pull request: https://github.com/apache/spark/pull/19544 [SPARK-22323] Design doc for pandas_udf I open this PR so we can have a place to discuss the design. We don't necessary need to merge a md file for the doc - this be embeded python documentation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/icexelloss/spark pandas-udf-design-doc-SPARK-22323 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19544.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19544 commit 3005312e0b5c0255ddd23736bfd24e2abf6cad95 Author: Li JinDate: 2017-10-20T15:09:08Z Initial design doc for pandas_udf --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19543 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19543: [SPARK-19606][MESOS] Support constraints in spark...
GitHub user pmackles opened a pull request: https://github.com/apache/spark/pull/19543 [SPARK-19606][MESOS] Support constraints in spark-dispatcher ## What changes were proposed in this pull request? A discussed in SPARK-19606, the addition of a new config property named "spark.mesos.constraints.driver" for constraining drivers running on a Mesos cluster ## How was this patch tested? Corresponding unit test added also tested locally on a Mesos cluster Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pmackles/spark SPARK-19606 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19543.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19543 commit 11d58593edd43b651bdbe5c269fc051a94269747 Author: Paul MacklesDate: 2017-10-20T15:08:33Z [SPARK-19606] Support constraints in spark-dispatcher --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 @gatorsmile, sure, detailed doc is great and defintely I support it. Just one thing I am worried of is duplication. If we add or change option, we have to update those together and .. you know it. Wouldn't it be nicer if we simply leave a pointer and remove the duplication if possible? If I understood correctly, the options would also be described in more details in the future in the new chapter and I think simpliy redirecting it might be feasible. I guess it shouldn't be too difficult to make a sub-chapter for options only, for example, like http://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options Otherwise, would you maybe thimk there should be dfferent contents for a different purpose, or want to leave the duplication just for now as something to be fixed soon? If so, I am okay. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18974: [SPARK-21750][SQL] Use Arrow 0.6.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18974 Hi, All. Two more Arrow releases seem to be out. How about the Python side? Can we catch up some? - 0.7.1 (1 October 2017) - 0.7.0 (17 September 2017) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19527 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19527 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82934/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19527 **[Test build #82934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82934/testReport)** for PR 19527 at commit [`e024120`](https://github.com/apache/spark/commit/e0241200c58a5ec201a0f1abdebc1660878ed49f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19479 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82931/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19479 **[Test build #82931 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82931/testReport)** for PR 19479 at commit [`6fe9985`](https://github.com/apache/spark/commit/6fe9985872c93b5dfa9972300ba3f59e97834d4c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19527 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82933/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19527 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19527 **[Test build #82933 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82933/testReport)** for PR 19527 at commit [`fe80e98`](https://github.com/apache/spark/commit/fe80e98712f52a4b5795c96a20e8f92e65849cb4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org