[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user li-zhihui commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-55852899 @andrewor14 @JoshRosen I am not sure if the test failure is related to the patch. Can you have a look at the failure? Or just retest it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2330#issuecomment-55854376 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20452/consoleFull) for PR 2330 at commit [`088ed8a`](https://github.com/apache/spark/commit/088ed8ac46bc59bf3d13d4a0be1d4c616a22d698). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3491] [WIP] [MLlib] [PySpark] use pickl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-55855022 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/121/consoleFull) for PR 2378 at commit [`1fccf1a`](https://github.com/apache/spark/commit/1fccf1adc91e78a6c9e65f4ae14ba770a7eecd2c). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3491] [WIP] [MLlib] [PySpark] use pickl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-55855160 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20449/consoleFull) for PR 2378 at commit [`19d0967`](https://github.com/apache/spark/commit/19d096783b60e741173f48f2944d91f650616140). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2350#issuecomment-55855267 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20450/consoleFull) for PR 2350 at commit [`a3b9693`](https://github.com/apache/spark/commit/a3b9693884b99622b626890fba08b2111d661e93). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-55855348 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/122/consoleFull) for PR 2378 at commit [`e431377`](https://github.com/apache/spark/commit/e431377170172571974aadcae7ff42d3a79e2cd9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-55855377 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20453/consoleFull) for PR 2378 at commit [`e431377`](https://github.com/apache/spark/commit/e431377170172571974aadcae7ff42d3a79e2cd9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-55856032 This looks like a failure due to a known flaky test: ``` [info] SparkSinkSuite: [info] - Success with ack *** FAILED *** [info] 4000 did not equal 5000 (SparkSinkSuite.scala:195) ``` I'm going to let Jenkins re-test this overnight. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-55856209 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/123/consoleFull) for PR 1616 at commit [`f9330d4`](https://github.com/apache/spark/commit/f9330d447dd749d530de96739f5e3598f973bd60). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2330#issuecomment-55856201 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20451/consoleFull) for PR 2330 at commit [`a79a259`](https://github.com/apache/spark/commit/a79a25918a96171c4b20c5c9153e5815bc23698e). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user li-zhihui commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-55856239 @JoshRosen thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Docs] Correct spark.files.fetchTimeout defaul...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2406#issuecomment-55856872 LGTM; thanks for updating the title! I'm going to merge this now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Docs] Correct spark.files.fetchTimeout defaul...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2406 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2397#discussion_r17648914 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan) + extends LeafNode with Command { + + override protected[sql] lazy val sideEffectResult = { +sqlContext.catalog.registerTable(None, tableName, sqlContext.executePlan(plan).analyzed) --- End diff -- (Probably my final comment on this PR :) ) As described in PR #2382, we shouldn't store analyzed logical plan when registering tables any more (see [here](https://github.com/apache/spark/pull/2382/files?diff=split#diff-5)). To prevent duplicated code, I'd suggest to import `SQLContext._` so that we can leverage [the implicit conversion](https://github.com/apache/spark/blob/008a5ed4808d1467b47c1d6fa4d950cc6c4976b7/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L78-L85) from `LogicalPlan` to `SchemaRDD`, and then simply do this: ```scala sqlContext.executePlan(plan).logical.registerTempTable(tableName) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2397#issuecomment-55857224 LGTM except for the analyzed logical plan issue as mentioned in my last comment. Thanks for working on this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3319] [SPARK-3338] Resolve Spark submit...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2232#issuecomment-55858105 Could this change behavior in cases where the spark.yarn.dist.files is configured with no scheme? Without this change, it would interpret no scheme to mean that it's on HDFS, and with it, it would interpret it to mean that it's on the local FS? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2330#issuecomment-55858115 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20452/consoleFull) for PR 2330 at commit [`088ed8a`](https://github.com/apache/spark/commit/088ed8ac46bc59bf3d13d4a0be1d4c616a22d698). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2873] [SQL] using ExternalAppendOnlyMap...
Github user guowei2 commented on the pull request: https://github.com/apache/spark/pull/2029#issuecomment-55858585 I've run a micro benchmark in my local with 5 records,1500 keys. Type | OnHeapAggregate | ExternalAggregate (happens 10 spills) | - | -- First run | 876ms | 16.9s Stablized runs | 150ms | 15.0s --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-55858790 @derrickburns I think these notes can go in code comments? (They each generate their own email too.) This is also a big-bang change covering several issues, some of which seem like more focused bug fixes or improvements. I would think it would be easier to break this down further if possible, and get in clear easy changes first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3550][MLLIB] fix a unresolved reference...
GitHub user OdinLin opened a pull request: https://github.com/apache/spark/pull/2423 [SPARK-3550][MLLIB] fix a unresolved reference variable 'nb' bug variable nb is not reference in the raise log You can merge this pull request into a Git repository by running: $ git pull https://github.com/OdinLin/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2423.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2423 commit 2748cd6b8db8cb9c549b9a9dae1e6889c4995739 Author: linyicong linyic...@dkhs.com Date: 2014-09-17T07:33:15Z fix a unresolved reference bug --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] fix a unresolved reference variable 'n...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2423#issuecomment-55860478 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-55860743 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/122/consoleFull) for PR 2378 at commit [`e431377`](https://github.com/apache/spark/commit/e431377170172571974aadcae7ff42d3a79e2cd9). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-55860805 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20453/consoleFull) for PR 2378 at commit [`e431377`](https://github.com/apache/spark/commit/e431377170172571974aadcae7ff42d3a79e2cd9). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-55861628 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/123/consoleFull) for PR 1616 at commit [`f9330d4`](https://github.com/apache/spark/commit/f9330d447dd749d530de96739f5e3598f973bd60). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3564] Display App ID on HistoryPage
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/2424 [SPARK-3564] Display App ID on HistoryPage You can merge this pull request into a Git repository by running: $ git pull https://github.com/sarutak/spark display-appid-on-webui Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2424.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2424 commit 417fe90ab77dfb3f8be5c2755b4738f86e74d5ba Author: Kousuke Saruta saru...@oss.nttdata.co.jp Date: 2014-09-17T08:04:35Z Added App ID column to HistoryPage --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...
Github user colorant commented on a diff in the pull request: https://github.com/apache/spark/pull/2330#discussion_r17650877 --- Diff: core/src/main/scala/org/apache/spark/network/netty/BlockClientFactory.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.netty + +import java.io.Closeable +import java.util.concurrent.{ConcurrentHashMap, TimeoutException} + +import io.netty.bootstrap.Bootstrap +import io.netty.buffer.PooledByteBufAllocator +import io.netty.channel._ +import io.netty.channel.epoll.{Epoll, EpollEventLoopGroup, EpollSocketChannel} +import io.netty.channel.nio.NioEventLoopGroup +import io.netty.channel.oio.OioEventLoopGroup +import io.netty.channel.socket.SocketChannel +import io.netty.channel.socket.nio.NioSocketChannel +import io.netty.channel.socket.oio.OioSocketChannel +import io.netty.util.internal.PlatformDependent + +import org.apache.spark.{Logging, SparkConf} +import org.apache.spark.util.Utils + + +/** + * Factory for creating [[BlockClient]] by using createClient. + * + * The factory maintains a connection pool to other hosts and should return the same [[BlockClient]] + * for the same remote host. It also shares a single worker thread pool for all [[BlockClient]]s. + */ +private[netty] +class BlockClientFactory(val conf: NettyConfig) extends Logging with Closeable { + + def this(sparkConf: SparkConf) = this(new NettyConfig(sparkConf)) + + /** A thread factory so the threads are named (for debugging). */ + private[this] val threadFactory = Utils.namedThreadFactory(spark-netty-client) + + /** Socket channel type, initialized by [[init]] depending ioMode. */ + private[this] var socketChannelClass: Class[_ : Channel] = _ + + /** Thread pool shared by all clients. */ + private[this] var workerGroup: EventLoopGroup = _ + + private[this] val connectionPool = new ConcurrentHashMap[(String, Int), BlockClient] + + // The encoders are stateless and can be shared among multiple clients. + private[this] val encoder = new ClientRequestEncoder + private[this] val decoder = new ServerResponseDecoder + + init() + + /** Initialize [[socketChannelClass]] and [[workerGroup]] based on ioMode. */ + private def init(): Unit = { +def initOio(): Unit = { + socketChannelClass = classOf[OioSocketChannel] + workerGroup = new OioEventLoopGroup(0, threadFactory) +} +def initNio(): Unit = { + socketChannelClass = classOf[NioSocketChannel] + workerGroup = new NioEventLoopGroup(0, threadFactory) +} +def initEpoll(): Unit = { + socketChannelClass = classOf[EpollSocketChannel] + workerGroup = new EpollEventLoopGroup(0, threadFactory) +} + +// For auto mode, first try epoll (only available on Linux), then nio. +conf.ioMode match { + case nio = initNio() + case oio = initOio() + case epoll = initEpoll() + case auto = if (Epoll.isAvailable) initEpoll() else initNio() +} + } + + /** + * Create a new BlockFetchingClient connecting to the given remote host / port. + * + * This blocks until a connection is successfully established. + * + * Concurrency: This method is safe to call from multiple threads. + */ + def createClient(remoteHost: String, remotePort: Int): BlockClient = { +// Get connection from the connection pool first. +// If it is not found or not active, create a new one. +val cachedClient = connectionPool.get((remoteHost, remotePort)) +if (cachedClient != null cachedClient.isActive) { + return cachedClient +} + +logInfo(sCreating new connection to $remoteHost:$remotePort) + +// There is a chance two threads are creating two different clients connecting to the same host. +// But that's probably ok ...
[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2330#discussion_r17650983 --- Diff: core/src/main/scala/org/apache/spark/network/netty/BlockClientFactory.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.netty + +import java.io.Closeable +import java.util.concurrent.{ConcurrentHashMap, TimeoutException} + +import io.netty.bootstrap.Bootstrap +import io.netty.buffer.PooledByteBufAllocator +import io.netty.channel._ +import io.netty.channel.epoll.{Epoll, EpollEventLoopGroup, EpollSocketChannel} +import io.netty.channel.nio.NioEventLoopGroup +import io.netty.channel.oio.OioEventLoopGroup +import io.netty.channel.socket.SocketChannel +import io.netty.channel.socket.nio.NioSocketChannel +import io.netty.channel.socket.oio.OioSocketChannel +import io.netty.util.internal.PlatformDependent + +import org.apache.spark.{Logging, SparkConf} +import org.apache.spark.util.Utils + + +/** + * Factory for creating [[BlockClient]] by using createClient. + * + * The factory maintains a connection pool to other hosts and should return the same [[BlockClient]] + * for the same remote host. It also shares a single worker thread pool for all [[BlockClient]]s. + */ +private[netty] +class BlockClientFactory(val conf: NettyConfig) extends Logging with Closeable { + + def this(sparkConf: SparkConf) = this(new NettyConfig(sparkConf)) + + /** A thread factory so the threads are named (for debugging). */ + private[this] val threadFactory = Utils.namedThreadFactory(spark-netty-client) + + /** Socket channel type, initialized by [[init]] depending ioMode. */ + private[this] var socketChannelClass: Class[_ : Channel] = _ + + /** Thread pool shared by all clients. */ + private[this] var workerGroup: EventLoopGroup = _ + + private[this] val connectionPool = new ConcurrentHashMap[(String, Int), BlockClient] + + // The encoders are stateless and can be shared among multiple clients. + private[this] val encoder = new ClientRequestEncoder + private[this] val decoder = new ServerResponseDecoder + + init() + + /** Initialize [[socketChannelClass]] and [[workerGroup]] based on ioMode. */ + private def init(): Unit = { +def initOio(): Unit = { + socketChannelClass = classOf[OioSocketChannel] + workerGroup = new OioEventLoopGroup(0, threadFactory) +} +def initNio(): Unit = { + socketChannelClass = classOf[NioSocketChannel] + workerGroup = new NioEventLoopGroup(0, threadFactory) +} +def initEpoll(): Unit = { + socketChannelClass = classOf[EpollSocketChannel] + workerGroup = new EpollEventLoopGroup(0, threadFactory) +} + +// For auto mode, first try epoll (only available on Linux), then nio. +conf.ioMode match { + case nio = initNio() + case oio = initOio() + case epoll = initEpoll() + case auto = if (Epoll.isAvailable) initEpoll() else initNio() +} + } + + /** + * Create a new BlockFetchingClient connecting to the given remote host / port. + * + * This blocks until a connection is successfully established. + * + * Concurrency: This method is safe to call from multiple threads. + */ + def createClient(remoteHost: String, remotePort: Int): BlockClient = { +// Get connection from the connection pool first. +// If it is not found or not active, create a new one. +val cachedClient = connectionPool.get((remoteHost, remotePort)) +if (cachedClient != null cachedClient.isActive) { + return cachedClient +} + +logInfo(sCreating new connection to $remoteHost:$remotePort) + +// There is a chance two threads are creating two different clients connecting to the same host. +// But that's probably ok ... +
[GitHub] spark pull request: [SPARK-3564] Display App ID on HistoryPage
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2424#issuecomment-55862451 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20454/consoleFull) for PR 2424 at commit [`417fe90`](https://github.com/apache/spark/commit/417fe90ab77dfb3f8be5c2755b4738f86e74d5ba). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add spark.driver.memory to config docs
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/2410#issuecomment-55862389 One note about this setting is that I'm not sure it works in all settings -- if you pass it to a driver as a parameter then it's too late to take effect (the JVM has already started). ```./bin/spark-shell --driver-java-options -Dspark.driver.memory=1576m``` doesn't actually change driver memory. Maybe the docs should mention where it does and doesn't work? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/2425 [SPARK-3543] Write TaskContext in Java and expose it through a static accessor. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-3543/withTaskContext Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2425.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2425 commit 333c7d644973c721f0b0509399579723d2a43446 Author: Prashant Sharma prashan...@imaginea.com Date: 2014-09-17T07:40:47Z Translated Task context from scala to java. commit f716fd1b0d84bcb08889b62af2d5f7a6d14b1cab Author: Prashant Sharma prashan...@imaginea.com Date: 2014-09-17T08:15:39Z introduced thread local for getting the task context. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2425#discussion_r17651529 --- Diff: core/src/main/java/org/apache/spark/TaskContext.java --- @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; + +import scala.Function0; +import scala.Function1; +import scala.Unit; +import scala.collection.JavaConversions; + +import org.apache.spark.annotation.DeveloperApi; +import org.apache.spark.executor.TaskMetrics; +import org.apache.spark.util.TaskCompletionListener; +import org.apache.spark.util.TaskCompletionListenerException; + +@DeveloperApi +public class TaskContext implements Serializable { + +public Integer stageId; +public Integer partitionId; +public Long attemptId; +public Boolean runningLocally; +public TaskMetrics taskMetrics; + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, Boolean runningLocally, + TaskMetrics taskMetrics) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = taskMetrics; +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, + Boolean runningLocally) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = false; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + --- End diff -- extra line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2425#discussion_r17651524 --- Diff: core/src/main/java/org/apache/spark/TaskContext.java --- @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; + +import scala.Function0; +import scala.Function1; +import scala.Unit; +import scala.collection.JavaConversions; + +import org.apache.spark.annotation.DeveloperApi; +import org.apache.spark.executor.TaskMetrics; +import org.apache.spark.util.TaskCompletionListener; +import org.apache.spark.util.TaskCompletionListenerException; + +@DeveloperApi +public class TaskContext implements Serializable { + +public Integer stageId; +public Integer partitionId; +public Long attemptId; +public Boolean runningLocally; +public TaskMetrics taskMetrics; + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, Boolean runningLocally, + TaskMetrics taskMetrics) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = taskMetrics; +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, + Boolean runningLocally) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = false; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + + +private static ThreadLocalTaskContext taskContext = --- End diff -- u don't need to wrap this, do u? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2425#discussion_r17651575 --- Diff: core/src/main/java/org/apache/spark/TaskContext.java --- @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; + +import scala.Function0; +import scala.Function1; +import scala.Unit; +import scala.collection.JavaConversions; + +import org.apache.spark.annotation.DeveloperApi; +import org.apache.spark.executor.TaskMetrics; +import org.apache.spark.util.TaskCompletionListener; +import org.apache.spark.util.TaskCompletionListenerException; + +@DeveloperApi +public class TaskContext implements Serializable { + +public Integer stageId; +public Integer partitionId; +public Long attemptId; +public Boolean runningLocally; +public TaskMetrics taskMetrics; + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, Boolean runningLocally, + TaskMetrics taskMetrics) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = taskMetrics; +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, + Boolean runningLocally) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = false; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + + +private static ThreadLocalTaskContext taskContext = +new ThreadLocalTaskContext(); + +public static TaskContext get() { +return taskContext.get(); +} + +// List of callback functions to execute when the task completes. +private transient ListTaskCompletionListener onCompleteCallbacks = +new ArrayListTaskCompletionListener(); + +// Whether the corresponding task has been killed. +private volatile Boolean interrupted = false; + +// Whether the task has completed. +private volatile Boolean completed = false; + +/** + * Checks whether the task has completed. + */ +public Boolean isCompleted() { +return completed; +} + +/** + * Checks whether the task has been killed. + */ +public Boolean isInterrupted() { +return interrupted; +} + + +/** + * Add a (Java friendly) listener to be executed on task completion. + * This will be called in all situation - success, failure, or cancellation. + * + * An example use is for HadoopRDD to register a callback to close the input stream. + */ +public TaskContext addTaskCompletionListener(TaskCompletionListener listener){ +onCompleteCallbacks.add(listener); +return this; +} + + +/** + * Add a listener in the form of a Scala closure to be executed on task completion. + * This will be called in all situation - success, failure, or cancellation. + * + * An example use is for HadoopRDD to register a callback to close the input stream. + */ +public TaskContext addTaskCompletionListener(final
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2425#issuecomment-55863345 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20455/consoleFull) for PR 2425 at commit [`f716fd1`](https://github.com/apache/spark/commit/f716fd1b0d84bcb08889b62af2d5f7a6d14b1cab). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2425#discussion_r17651616 --- Diff: core/src/main/java/org/apache/spark/TaskContext.java --- @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; + +import scala.Function0; +import scala.Function1; +import scala.Unit; +import scala.collection.JavaConversions; + +import org.apache.spark.annotation.DeveloperApi; +import org.apache.spark.executor.TaskMetrics; +import org.apache.spark.util.TaskCompletionListener; +import org.apache.spark.util.TaskCompletionListenerException; + +@DeveloperApi +public class TaskContext implements Serializable { + +public Integer stageId; +public Integer partitionId; +public Long attemptId; +public Boolean runningLocally; +public TaskMetrics taskMetrics; + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, Boolean runningLocally, + TaskMetrics taskMetrics) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = taskMetrics; +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, + Boolean runningLocally) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = false; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + + +private static ThreadLocalTaskContext taskContext = +new ThreadLocalTaskContext(); + +public static TaskContext get() { +return taskContext.get(); +} + +// List of callback functions to execute when the task completes. +private transient ListTaskCompletionListener onCompleteCallbacks = +new ArrayListTaskCompletionListener(); + +// Whether the corresponding task has been killed. +private volatile Boolean interrupted = false; + +// Whether the task has completed. +private volatile Boolean completed = false; + +/** + * Checks whether the task has completed. + */ +public Boolean isCompleted() { +return completed; +} + +/** + * Checks whether the task has been killed. + */ +public Boolean isInterrupted() { +return interrupted; +} + + +/** + * Add a (Java friendly) listener to be executed on task completion. + * This will be called in all situation - success, failure, or cancellation. + * + * An example use is for HadoopRDD to register a callback to close the input stream. + */ +public TaskContext addTaskCompletionListener(TaskCompletionListener listener){ +onCompleteCallbacks.add(listener); +return this; +} + + +/** + * Add a listener in the form of a Scala closure to be executed on task completion. + * This will be called in all situation - success, failure, or cancellation. + * + * An example use is for HadoopRDD to register a callback to close the input stream. + */ +public TaskContext addTaskCompletionListener(final
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2425#discussion_r17651635 --- Diff: core/src/main/java/org/apache/spark/TaskContext.java --- @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; + +import scala.Function0; +import scala.Function1; +import scala.Unit; +import scala.collection.JavaConversions; + +import org.apache.spark.annotation.DeveloperApi; +import org.apache.spark.executor.TaskMetrics; +import org.apache.spark.util.TaskCompletionListener; +import org.apache.spark.util.TaskCompletionListenerException; + +@DeveloperApi +public class TaskContext implements Serializable { + +public Integer stageId; --- End diff -- do we do 2 space or 4 space indent? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2425#discussion_r17651668 --- Diff: core/src/main/java/org/apache/spark/TaskContext.java --- @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; + +import scala.Function0; +import scala.Function1; +import scala.Unit; +import scala.collection.JavaConversions; + +import org.apache.spark.annotation.DeveloperApi; +import org.apache.spark.executor.TaskMetrics; +import org.apache.spark.util.TaskCompletionListener; +import org.apache.spark.util.TaskCompletionListenerException; + +@DeveloperApi +public class TaskContext implements Serializable { + +public Integer stageId; --- End diff -- actually don't make them public - this breaks binary compatibility right now. you should make them private, and create a public accessor stageId() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2425#discussion_r17651760 --- Diff: core/src/main/java/org/apache/spark/TaskContext.java --- @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; + +import scala.Function0; +import scala.Function1; +import scala.Unit; +import scala.collection.JavaConversions; + +import org.apache.spark.annotation.DeveloperApi; +import org.apache.spark.executor.TaskMetrics; +import org.apache.spark.util.TaskCompletionListener; +import org.apache.spark.util.TaskCompletionListenerException; + +@DeveloperApi +public class TaskContext implements Serializable { + +public Integer stageId; +public Integer partitionId; +public Long attemptId; +public Boolean runningLocally; +public TaskMetrics taskMetrics; + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, Boolean runningLocally, + TaskMetrics taskMetrics) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = taskMetrics; +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, + Boolean runningLocally) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = false; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + + +private static ThreadLocalTaskContext taskContext = +new ThreadLocalTaskContext(); + +public static TaskContext get() { +return taskContext.get(); +} + +// List of callback functions to execute when the task completes. +private transient ListTaskCompletionListener onCompleteCallbacks = +new ArrayListTaskCompletionListener(); + +// Whether the corresponding task has been killed. +private volatile Boolean interrupted = false; + +// Whether the task has completed. +private volatile Boolean completed = false; + +/** + * Checks whether the task has completed. + */ +public Boolean isCompleted() { +return completed; +} + +/** + * Checks whether the task has been killed. + */ +public Boolean isInterrupted() { +return interrupted; +} + --- End diff -- can u go through the file and use only one line to separate functions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2425#discussion_r17651763 --- Diff: core/src/main/java/org/apache/spark/TaskContext.java --- @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; + +import scala.Function0; +import scala.Function1; +import scala.Unit; +import scala.collection.JavaConversions; + +import org.apache.spark.annotation.DeveloperApi; +import org.apache.spark.executor.TaskMetrics; +import org.apache.spark.util.TaskCompletionListener; +import org.apache.spark.util.TaskCompletionListenerException; + +@DeveloperApi +public class TaskContext implements Serializable { + +public Integer stageId; +public Integer partitionId; +public Long attemptId; +public Boolean runningLocally; +public TaskMetrics taskMetrics; + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, Boolean runningLocally, + TaskMetrics taskMetrics) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = taskMetrics; +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, + Boolean runningLocally) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = false; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + + +private static ThreadLocalTaskContext taskContext = +new ThreadLocalTaskContext(); + +public static TaskContext get() { +return taskContext.get(); +} + +// List of callback functions to execute when the task completes. +private transient ListTaskCompletionListener onCompleteCallbacks = +new ArrayListTaskCompletionListener(); + +// Whether the corresponding task has been killed. +private volatile Boolean interrupted = false; + +// Whether the task has completed. +private volatile Boolean completed = false; + +/** + * Checks whether the task has completed. + */ +public Boolean isCompleted() { +return completed; +} + +/** + * Checks whether the task has been killed. + */ +public Boolean isInterrupted() { +return interrupted; +} + + +/** + * Add a (Java friendly) listener to be executed on task completion. + * This will be called in all situation - success, failure, or cancellation. + * + * An example use is for HadoopRDD to register a callback to close the input stream. + */ +public TaskContext addTaskCompletionListener(TaskCompletionListener listener){ --- End diff -- space before { --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail:
[GitHub] spark pull request: [SPARK-3566] [BUILD] .gitignore and .rat-exclu...
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/2426 [SPARK-3566] [BUILD] .gitignore and .rat-excludes should consider cmd file and Emacs' backup files You can merge this pull request into a Git repository by running: $ git pull https://github.com/sarutak/spark emacs-metafiles-ignore Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2426.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2426 commit 8cade06dce9ea0ed09c47c01e31efa9e92312fcb Author: Kousuke Saruta saru...@oss.nttdata.co.jp Date: 2014-09-16T00:40:29Z Modified .gitignore to ignore emacs lock file and backup file commit 897da6321001bc3f9344ee0b501628d5c579d28a Author: Kousuke Saruta saru...@oss.nttdata.co.jp Date: 2014-09-17T08:12:51Z Merge branch 'master' of git://git.apache.org/spark into emacs-metafiles-ignore commit 6a0a5eb6a352009dd8c6b902a5253b8846b1bffc Author: Kousuke Saruta saru...@oss.nttdata.co.jp Date: 2014-09-17T08:22:28Z Added cmd file entry to .rat-excludes and .gitignore --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2425#discussion_r17651774 --- Diff: core/src/main/java/org/apache/spark/TaskContext.java --- @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; + +import scala.Function0; +import scala.Function1; +import scala.Unit; +import scala.collection.JavaConversions; + +import org.apache.spark.annotation.DeveloperApi; +import org.apache.spark.executor.TaskMetrics; +import org.apache.spark.util.TaskCompletionListener; +import org.apache.spark.util.TaskCompletionListenerException; + +@DeveloperApi +public class TaskContext implements Serializable { + +public Integer stageId; +public Integer partitionId; +public Long attemptId; +public Boolean runningLocally; +public TaskMetrics taskMetrics; + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, Boolean runningLocally, + TaskMetrics taskMetrics) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = taskMetrics; +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, + Boolean runningLocally) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = false; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + + +private static ThreadLocalTaskContext taskContext = +new ThreadLocalTaskContext(); + +public static TaskContext get() { +return taskContext.get(); +} + +// List of callback functions to execute when the task completes. +private transient ListTaskCompletionListener onCompleteCallbacks = +new ArrayListTaskCompletionListener(); + +// Whether the corresponding task has been killed. +private volatile Boolean interrupted = false; + +// Whether the task has completed. +private volatile Boolean completed = false; + +/** + * Checks whether the task has completed. + */ +public Boolean isCompleted() { +return completed; +} + +/** + * Checks whether the task has been killed. + */ +public Boolean isInterrupted() { +return interrupted; +} + + +/** + * Add a (Java friendly) listener to be executed on task completion. + * This will be called in all situation - success, failure, or cancellation. + * + * An example use is for HadoopRDD to register a callback to close the input stream. + */ +public TaskContext addTaskCompletionListener(TaskCompletionListener listener){ +onCompleteCallbacks.add(listener); +return this; +} + + +/** + * Add a listener in the form of a Scala closure to be executed on task completion. + * This will be called in all situation - success, failure, or cancellation. + * + * An example use is for HadoopRDD to register a callback to close the input stream. + */ +public TaskContext addTaskCompletionListener(final
[GitHub] spark pull request: [SPARK-3565]Fix configuration item not consist...
GitHub user WangTaoTheTonic opened a pull request: https://github.com/apache/spark/pull/2427 [SPARK-3565]Fix configuration item not consistent with document https://issues.apache.org/jira/browse/SPARK-3565 spark.ports.maxRetries should be spark.port.maxRetries. Make the configuration keys in document and code consistent. You can merge this pull request into a Git repository by running: $ git pull https://github.com/WangTaoTheTonic/spark fixPortRetries Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2427.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2427 commit 3700dbaef56b573b67c8ca6824a0b5037a70608d Author: WangTaoTheTonic barneystin...@aliyun.com Date: 2014-09-17T08:23:45Z Fix configuration item not consistent with document --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition suppo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-55863834 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20456/consoleFull) for PR 2226 at commit [`5033928`](https://github.com/apache/spark/commit/5033928ee8709c3d1b758856efe2c6cb951ea574). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3565]Fix configuration item not consist...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2427#issuecomment-55864303 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20457/consoleFull) for PR 2427 at commit [`3700dba`](https://github.com/apache/spark/commit/3700dbaef56b573b67c8ca6824a0b5037a70608d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3566] [BUILD] .gitignore and .rat-exclu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2426#issuecomment-55864329 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20458/consoleFull) for PR 2426 at commit [`6a0a5eb`](https://github.com/apache/spark/commit/6a0a5eb6a352009dd8c6b902a5253b8846b1bffc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2745 [STREAMING] Add Java friendly metho...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2403#issuecomment-55864499 I'm not sure what to make of the MIMA errors: * the type hierarchy of object org.apache.spark.streaming.Duration has changed in new version. Missing types {scala.runtime.AbstractFunction1} * method apply(java.lang.Object)java.lang.Object in object org.apache.spark.streaming.Duration's type has changed; was (java.lang.Object)java.lang.Object, is now: (Long)org.apache.spark.streaming.Duration * method toString()java.lang.String in object org.apache.spark.streaming.Duration does not have a correspondent in new version I assume this is a side-effect of adding `object Duration`, but I don't see why this would have changed the `apply` method this way. `toString` still definitely exists. So I wonder if this is a false positive and should be suppressed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3485][SQL] Use GenericUDFUtils.Conversi...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/2407#issuecomment-55864941 I add a test case to this file, the current master will fail the case because we missed `Short` in the `primitiveTypes` here. With this new solution, we can get rid of maintaining of `primitiveTypes`, and eliminate potential bugs here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3485][SQL] Use GenericUDFUtils.Conversi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2407#issuecomment-55865221 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20459/consoleFull) for PR 2407 at commit [`15762d2`](https://github.com/apache/spark/commit/15762d2f3835644397d9dbf5ba87bedd9bb91e1c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...
Github user colorant commented on a diff in the pull request: https://github.com/apache/spark/pull/2330#discussion_r17652889 --- Diff: core/src/main/scala/org/apache/spark/network/netty/BlockServer.scala --- @@ -121,18 +91,18 @@ class BlockServer(conf: NettyConfig, dataProvider: BlockDataProvider) extends Lo bootstrap.option[java.lang.Integer](ChannelOption.SO_BACKLOG, backLog) } conf.receiveBuf.foreach { receiveBuf = - bootstrap.option[java.lang.Integer](ChannelOption.SO_RCVBUF, receiveBuf) + bootstrap.childOption[java.lang.Integer](ChannelOption.SO_RCVBUF, receiveBuf) } conf.sendBuf.foreach { sendBuf = - bootstrap.option[java.lang.Integer](ChannelOption.SO_SNDBUF, sendBuf) + bootstrap.childOption[java.lang.Integer](ChannelOption.SO_SNDBUF, sendBuf) } bootstrap.childHandler(new ChannelInitializer[SocketChannel] { override def initChannel(ch: SocketChannel): Unit = { ch.pipeline - .addLast(frameDecoder, new LineBasedFrameDecoder(1024)) // max block id length 1024 - .addLast(stringDecoder, new StringDecoder(CharsetUtil.UTF_8)) - .addLast(blockHeaderEncoder, new BlockHeaderEncoder) + .addLast(frameDecoder, ProtocolUtils.createFrameDecoder()) + .addLast(clientRequestDecoder, new ClientRequestDecoder) + .addLast(serverResponseEncoder, new ServerResponseEncoder) .addLast(handler, new BlockServerHandler(dataProvider)) --- End diff -- should this handler run on separate EventLoopGroup? since GetBlockData might block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3453] [WIP] Refactor Netty module to us...
Github user colorant commented on a diff in the pull request: https://github.com/apache/spark/pull/2330#discussion_r17653172 --- Diff: core/src/main/scala/org/apache/spark/network/netty/BlockClientFactory.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.netty + +import java.io.Closeable +import java.util.concurrent.{ConcurrentHashMap, TimeoutException} + +import io.netty.bootstrap.Bootstrap +import io.netty.buffer.PooledByteBufAllocator +import io.netty.channel._ +import io.netty.channel.epoll.{Epoll, EpollEventLoopGroup, EpollSocketChannel} +import io.netty.channel.nio.NioEventLoopGroup +import io.netty.channel.oio.OioEventLoopGroup +import io.netty.channel.socket.SocketChannel +import io.netty.channel.socket.nio.NioSocketChannel +import io.netty.channel.socket.oio.OioSocketChannel +import io.netty.util.internal.PlatformDependent + +import org.apache.spark.{Logging, SparkConf} +import org.apache.spark.util.Utils + + +/** + * Factory for creating [[BlockClient]] by using createClient. + * + * The factory maintains a connection pool to other hosts and should return the same [[BlockClient]] + * for the same remote host. It also shares a single worker thread pool for all [[BlockClient]]s. + */ +private[netty] +class BlockClientFactory(val conf: NettyConfig) extends Logging with Closeable { + + def this(sparkConf: SparkConf) = this(new NettyConfig(sparkConf)) + + /** A thread factory so the threads are named (for debugging). */ + private[this] val threadFactory = Utils.namedThreadFactory(spark-netty-client) + + /** Socket channel type, initialized by [[init]] depending ioMode. */ + private[this] var socketChannelClass: Class[_ : Channel] = _ + + /** Thread pool shared by all clients. */ + private[this] var workerGroup: EventLoopGroup = _ + + private[this] val connectionPool = new ConcurrentHashMap[(String, Int), BlockClient] + + // The encoders are stateless and can be shared among multiple clients. + private[this] val encoder = new ClientRequestEncoder + private[this] val decoder = new ServerResponseDecoder + + init() + + /** Initialize [[socketChannelClass]] and [[workerGroup]] based on ioMode. */ + private def init(): Unit = { +def initOio(): Unit = { + socketChannelClass = classOf[OioSocketChannel] + workerGroup = new OioEventLoopGroup(0, threadFactory) +} +def initNio(): Unit = { + socketChannelClass = classOf[NioSocketChannel] + workerGroup = new NioEventLoopGroup(0, threadFactory) +} +def initEpoll(): Unit = { + socketChannelClass = classOf[EpollSocketChannel] + workerGroup = new EpollEventLoopGroup(0, threadFactory) +} + +// For auto mode, first try epoll (only available on Linux), then nio. +conf.ioMode match { + case nio = initNio() + case oio = initOio() + case epoll = initEpoll() + case auto = if (Epoll.isAvailable) initEpoll() else initNio() +} + } + + /** + * Create a new BlockFetchingClient connecting to the given remote host / port. + * + * This blocks until a connection is successfully established. + * + * Concurrency: This method is safe to call from multiple threads. + */ + def createClient(remoteHost: String, remotePort: Int): BlockClient = { +// Get connection from the connection pool first. +// If it is not found or not active, create a new one. +val cachedClient = connectionPool.get((remoteHost, remotePort)) +if (cachedClient != null cachedClient.isActive) { + return cachedClient +} + +logInfo(sCreating new connection to $remoteHost:$remotePort) + +// There is a chance two threads are creating two different clients connecting to the same host. +// But that's probably ok ...
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/2425#discussion_r17653450 --- Diff: core/src/main/java/org/apache/spark/TaskContext.java --- @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; + +import scala.Function0; +import scala.Function1; +import scala.Unit; +import scala.collection.JavaConversions; + +import org.apache.spark.annotation.DeveloperApi; +import org.apache.spark.executor.TaskMetrics; +import org.apache.spark.util.TaskCompletionListener; +import org.apache.spark.util.TaskCompletionListenerException; + +@DeveloperApi +public class TaskContext implements Serializable { + +public Integer stageId; +public Integer partitionId; +public Long attemptId; +public Boolean runningLocally; +public TaskMetrics taskMetrics; + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, Boolean runningLocally, + TaskMetrics taskMetrics) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = taskMetrics; +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId, + Boolean runningLocally) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = runningLocally; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + +public TaskContext(Integer stageId, Integer partitionId, Long attemptId) { +this.attemptId = attemptId; +this.partitionId = partitionId; +this.runningLocally = false; +this.stageId = stageId; +this.taskMetrics = TaskMetrics.empty(); +taskContext.set(this); +} + + +private static ThreadLocalTaskContext taskContext = --- End diff -- You mean text wrap ?, yes in one line they are 114 chars. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...
Github user adrian-wang closed the pull request at: https://github.com/apache/spark/pull/2355 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/2355#issuecomment-55867500 maybe 2407 is better, I'll close this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2425#issuecomment-55867575 @rxin One side question, Java8ApiSuite(s) don't compile, looks like we have been overlooking them for a while. May be we could just remove them ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2425#issuecomment-55868820 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20461/consoleFull) for PR 2425 at commit [`edf945e`](https://github.com/apache/spark/commit/edf945e6765314f0b87092d460f71aa70feecdc5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition suppo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-55868821 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20462/consoleFull) for PR 2226 at commit [`096bbbc`](https://github.com/apache/spark/commit/096bbbc261e6e5015d7ff4b90f299ea321ba7651). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3566] [BUILD] .gitignore and .rat-exclu...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2426#issuecomment-55868873 You can have global gitignore. https://help.github.com/articles/ignoring-files. I am not sure how many editors and such we are going to support. Mind closing this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3564] Display App ID on HistoryPage
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2424#issuecomment-55868956 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20454/consoleFull) for PR 2424 at commit [`417fe90`](https://github.com/apache/spark/commit/417fe90ab77dfb3f8be5c2755b4738f86e74d5ba). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3565]Fix configuration item not consist...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2427#issuecomment-55869240 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20457/consoleFull) for PR 2427 at commit [`3700dba`](https://github.com/apache/spark/commit/3700dbaef56b573b67c8ca6824a0b5037a70608d). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3566] [BUILD] .gitignore and .rat-exclu...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2426#issuecomment-55869359 @ScrapCodes Thanks for your comment. Vim/vi and Emacs is the most popular editors and current .gitignore considers vim's swp file so Emacs should be considered. Also, this PR includes the modification for Windows batch file so I cannot close this PR, sorry. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3566] [BUILD] .gitignore and .rat-exclu...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2426#issuecomment-55869795 There is nothing like spark-env.cmd in the code base ? + Emacs is my editor of choice too and I add those excludes in gitignore global simply because I can not go and update gitignore on every open source project I work on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2425#issuecomment-55869923 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20455/consoleFull) for PR 2425 at commit [`f716fd1`](https://github.com/apache/spark/commit/f716fd1b0d84bcb08889b62af2d5f7a6d14b1cab). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2344#issuecomment-55870259 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20463/consoleFull) for PR 2344 at commit [`413f946`](https://github.com/apache/spark/commit/413f946ea2d9488b08186307bed550fa298a8a23). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3566] [BUILD] .gitignore and .rat-exclu...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2426#issuecomment-55870586 Current code base doesn't include spark-env.sh but user can create and used by spark-class2.cmd, run-examples2.cmd, compute-classpath.cmd and pyspark2.cmd. And if applying your logic, the entry for swp should be removed from .gitignore. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition suppo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-55872750 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20456/consoleFull) for PR 2226 at commit [`5033928`](https://github.com/apache/spark/commit/5033928ee8709c3d1b758856efe2c6cb951ea574). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2425#issuecomment-55873286 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20464/consoleFull) for PR 2425 at commit [`a7d5e23`](https://github.com/apache/spark/commit/a7d5e23330a159e91581a35d46e1846770eb421d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2709] A tool to output all public API o...
Github user ScrapCodes closed the pull request at: https://github.com/apache/spark/pull/1688 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3485][SQL] Use GenericUDFUtils.Conversi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2407#issuecomment-55874010 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20459/consoleFull) for PR 2407 at commit [`15762d2`](https://github.com/apache/spark/commit/15762d2f3835644397d9dbf5ba87bedd9bb91e1c). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3567] appId field in SparkDeploySchedul...
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/2428 [SPARK-3567] appId field in SparkDeploySchedulerBackend should be volatile You can merge this pull request into a Git repository by running: $ git pull https://github.com/sarutak/spark appid-volatile-modification Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2428.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2428 commit c7d890d1557254ff7ea01263a6512a3d846d4270 Author: Kousuke Saruta saru...@oss.nttdata.co.jp Date: 2014-09-17T10:19:30Z Added volatile modifier to appId field in SparkDeploySchedulerBackend --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2425#issuecomment-55875586 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20461/consoleFull) for PR 2425 at commit [`edf945e`](https://github.com/apache/spark/commit/edf945e6765314f0b87092d460f71aa70feecdc5). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3567] appId field in SparkDeploySchedul...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2428#issuecomment-55875701 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20465/consoleFull) for PR 2428 at commit [`c7d890d`](https://github.com/apache/spark/commit/c7d890d1557254ff7ea01263a6512a3d846d4270). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3566] [BUILD] .gitignore and .rat-exclu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2426#issuecomment-55876167 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20458/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2344#issuecomment-55877253 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20463/consoleFull) for PR 2344 at commit [`413f946`](https://github.com/apache/spark/commit/413f946ea2d9488b08186307bed550fa298a8a23). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Cast(child: Expression, dataType: DataType) extends UnaryExpression with Logging ` * `public class DateType extends DataType ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition suppo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-55877476 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20462/consoleFull) for PR 2226 at commit [`096bbbc`](https://github.com/apache/spark/commit/096bbbc261e6e5015d7ff4b90f299ea321ba7651). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2425#issuecomment-55879258 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20464/consoleFull) for PR 2425 at commit [`a7d5e23`](https://github.com/apache/spark/commit/a7d5e23330a159e91581a35d46e1846770eb421d). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3567] appId field in SparkDeploySchedul...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2428#issuecomment-55881519 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20465/consoleFull) for PR 2428 at commit [`c7d890d`](https://github.com/apache/spark/commit/c7d890d1557254ff7ea01263a6512a3d846d4270). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2397#discussion_r17659871 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan) + extends LeafNode with Command { + + override protected[sql] lazy val sideEffectResult = { +sqlContext.catalog.registerTable(None, tableName, sqlContext.executePlan(plan).analyzed) --- End diff -- Thank you for your comment. It is a good idea to import ```sqlContext._```. But we can simplify as below code if we import it. Please comment on it. ``` import sqlContext._ plan.registerTempTable(tableName) cacheTable(tableName) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2126#issuecomment-55887296 @jongyoul sorry I don't know what you mean by on Aug? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/2344#issuecomment-55889454 Just rebase code, the failure is in spark-core and not in this patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/2126#issuecomment-55889694 @tgravescs not seriously, I'm deal with this issue unless it's not fixed until Sep. On Wednesday, September 17, 2014, Tom Graves notificati...@github.com wrote: @jongyoul https://github.com/jongyoul sorry I don't know what you mean by on Aug? â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/2126#issuecomment-55887296. -- ì´ì¢ ì´, Jongyoul Lee, æå®ç http://madeng.net --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2126#issuecomment-55892735 Ok, sounds good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2257#issuecomment-55895625 @pwendell @andrewor14 Any concerns with the time on this? I know our unit tests already take a long time to run but this doesn't seem to bad. I think going forward before we add a bunch more tests we might need to figure out how to run them separately. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3304] [YARN] ApplicationMaster's Finish...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2198#issuecomment-55895905 @sarutak it is going to be another day or 2 before I can get time to look at this in detail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add support for zipping a sequence of RDDs
GitHub user mohitjaggi opened a pull request: https://github.com/apache/spark/pull/2429 add support for zipping a sequence of RDDs a proposed fix for https://issues.apache.org/jira/browse/SPARK-3489 You can merge this pull request into a Git repository by running: $ git pull https://github.com/AyasdiOpenSource/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2429.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2429 commit ff0ae81ccd1549db9aa58f5152871c34d23d7c22 Author: Mohit Jaggi mo...@ayasdi.com Date: 2014-09-17T13:55:07Z add support for zipping a sequence of RDDs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add support for zipping a sequence of RDDs
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2429#issuecomment-55896797 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-55898846 I agree that some of my comments should go in the code. As for the Big Bang change, I understand your concern. The distance functions touches practically everything. The change in the treatment of number of clusters is also a broad change. So, while I would prefer to make small increment changes, these two changes required touching lots of code. Sent from my iPhone On Sep 17, 2014, at 12:33 AM, Sean Owen notificati...@github.com wrote: @derrickburns I think these notes can go in code comments? (They each generate their own email too.) This is also a big-bang change covering several issues, some of which seem like more focused bug fixes or improvements. I would think it would be easier to break this down further if possible, and get in clear easy changes first. â Reply to this email directly or view it on GitHub. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-55900054 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20466/consoleFull) for PR 1290 at commit [`f1af6cf`](https://github.com/apache/spark/commit/f1af6cfc9c50470c94ce178d8acf80228d496071). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1455] [SPARK-3534] [Build] When possibl...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/2420#issuecomment-55900507 cc @marmbrus @pwendell Dunno if you guys got notified via JIRA of this pull request or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...
Github user willb commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-55902476 Here's a somewhat-related concern: it seems like JVM overhead is unlikely to scale linearly with requested heap size, so maybe a straight-up 15% isn't a great default? (If you have hard data on how heap requirements grow with job size, I'd be interested in seeing it.) Perhaps it would make more sense to reserve whichever is smaller of 15% or some fixed but reasonable amount. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-55903580 That implies that as you grow the heap, you're not adding threads (or other things that use off-heap memory). I'm not familiar with Spark's execution model, but I would assume that as you scale up, the workers will need more threads to serve cache requests and what have you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-55903894 Oh, and one more thing you may want to think about is the OS filesystem buffers. Again, as you scale up the heap, you may want to proportionally reserve a slice of off-heap memory just for the OS. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3304] [YARN] ApplicationMaster's Finish...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2198#issuecomment-55904701 @tgravescs Thank you so much! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/2014#discussion_r17668529 --- Diff: CONTRIBUTING.md --- @@ -0,0 +1,12 @@ +## Contributing to Spark --- End diff -- Having this is pretty nice! I like the banner you get when opening a new PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Docs] minor grammar fix
GitHub user nchammas opened a pull request: https://github.com/apache/spark/pull/2430 [Docs] minor grammar fix You can merge this pull request into a Git repository by running: $ git pull https://github.com/nchammas/spark patch-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2430.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2430 commit d476bfb21b7cbe066aa313128644866e99204aaf Author: Nicholas Chammas nicholas.cham...@gmail.com Date: 2014-09-17T14:55:03Z [Docs] minor grammar fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Docs] minor grammar fix
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2430#issuecomment-55906606 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20467/consoleFull) for PR 2430 at commit [`d476bfb`](https://github.com/apache/spark/commit/d476bfb21b7cbe066aa313128644866e99204aaf). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add spark.driver.memory to config docs
Github user nartz commented on the pull request: https://github.com/apache/spark/pull/2410#issuecomment-55909584 using `./bin/spark-submit --driver-memory 3g myscript.py` on the command line works for me --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3177 (on Master Branch)
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2204#issuecomment-55910777 looks good. I tested it on hadoop 0.23 and it works also. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3177 (on Master Branch)
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2204 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-55911175 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20466/consoleFull) for PR 1290 at commit [`f1af6cf`](https://github.com/apache/spark/commit/f1af6cfc9c50470c94ce178d8acf80228d496071). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/1345#issuecomment-55912866 @Forevian, can you please update it to merge cleanly? Then hunt down a reviewer! It would be great to have this in 1.2. It would make our code significantly more efficient. (Currently we save to S3 and load from S3 to checkpoint. With your change I think we could avoid the unnecessary loading.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org